**Tomáš Vojnar Lijun Zhang (Eds.)**

# **Tools and Algorithms for the Construction and Analysis of Systems**

**25th International Conference, TACAS 2019 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019 Prague, Czech Republic, April 6–11, 2019, Proceedings, Part II**

# Lecture Notes in Computer Science 11428

Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

#### Editorial Board Members

David Hutchison, UK Josef Kittler, UK Friedemann Mattern, Switzerland Moni Naor, Israel Bernhard Steffen, Germany Doug Tygar, USA

Takeo Kanade, USA Jon M. Kleinberg, USA John C. Mitchell, USA C. Pandu Rangan, India Demetri Terzopoulos, USA

## Advanced Research in Computing and Software Science Subline of Lecture Notes in Computer Science

Subline Series Editors

Giorgio Ausiello, University of Rome 'La Sapienza', Italy Vladimiro Sassone, University of Southampton, UK

Subline Advisory Board

Susanne Albers, TU Munich, Germany Benjamin C. Pierce, University of Pennsylvania, USA Bernhard Steffen, University of Dortmund, Germany Deng Xiaotie, Peking University, Beijing, China Jeannette M. Wing, Microsoft Research, Redmond, WA, USA

More information about this series at http://www.springer.com/series/7407

# Tools and Algorithms for the Construction and Analysis of Systems

25th International Conference, TACAS 2019 Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2019 Prague, Czech Republic, April 6–11, 2019 Proceedings, Part II

Editors Tomáš Vojnar Brno University of Technology Brno, Czech Republic

Lijun Zhang Chinese Academy of Sciences Beijing, China

ISSN 0302-9743 ISSN 1611-3349 (electronic) Lecture Notes in Computer Science ISBN 978-3-030-17464-4 ISBN 978-3-030-17465-1 (eBook) https://doi.org/10.1007/978-3-030-17465-1

LNCS Sublibrary: SL1 – Theoretical Computer Science and General Issues

© The Editor(s) (if applicable) and The Author(s) 2019. This book is an open access publication.

Open Access This book is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this book are included in the book's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the book's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.

The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland

### ETAPS Foreword

Welcome to the 22nd ETAPS! This is the first time that ETAPS took place in the Czech Republic in its beautiful capital Prague.

ETAPS 2019 was the 22nd instance of the European Joint Conferences on Theory and Practice of Software. ETAPS is an annual federated conference established in 1998, and consists of five conferences: ESOP, FASE, FoSSaCS, TACAS, and POST. Each conference has its own Program Committee (PC) and its own Steering Committee (SC). The conferences cover various aspects of software systems, ranging from theoretical computer science to foundations to programming language developments, analysis tools, formal approaches to software engineering, and security.

Organizing these conferences in a coherent, highly synchronized conference program enables participation in an exciting event, offering the possibility to meet many researchers working in different directions in the field and to easily attend talks of different conferences. ETAPS 2019 featured a new program item: the Mentoring Workshop. This workshop is intended to help students early in the program with advice on research, career, and life in the fields of computing that are covered by the ETAPS conference. On the weekend before the main conference, numerous satellite workshops took place and attracted many researchers from all over the globe.

ETAPS 2019 received 436 submissions in total, 137 of which were accepted, yielding an overall acceptance rate of 31.4%. I thank all the authors for their interest in ETAPS, all the reviewers for their reviewing efforts, the PC members for their contributions, and in particular the PC (co-)chairs for their hard work in running this entire intensive process. Last but not least, my congratulations to all authors of the accepted papers!

ETAPS 2019 featured the unifying invited speakers Marsha Chechik (University of Toronto) and Kathleen Fisher (Tufts University) and the conference-specific invited speakers (FoSSaCS) Thomas Colcombet (IRIF, France) and (TACAS) Cormac Flanagan (University of California at Santa Cruz). Invited tutorials were provided by Dirk Beyer (Ludwig Maximilian University) on software verification and Cesare Tinelli (University of Iowa) on SMT and its applications. On behalf of the ETAPS 2019 attendants, I thank all the speakers for their inspiring and interesting talks!

ETAPS 2019 took place in Prague, Czech Republic, and was organized by Charles University. Charles University was founded in 1348 and was the first university in Central Europe. It currently hosts more than 50,000 students. ETAPS 2019 was further supported by the following associations and societies: ETAPS e.V., EATCS (European Association for Theoretical Computer Science), EAPLS (European Association for Programming Languages and Systems), and EASST (European Association of Software Science and Technology). The local organization team consisted of Jan Vitek and Jan Kofron (general chairs), Barbora Buhnova, Milan Ceska, Ryan Culpepper, Vojtech Horky, Paley Li, Petr Maj, Artem Pelenitsyn, and David Safranek.

The ETAPS SC consists of an Executive Board, and representatives of the individual ETAPS conferences, as well as representatives of EATCS, EAPLS, and EASST. The Executive Board consists of Gilles Barthe (Madrid), Holger Hermanns (Saarbrücken), Joost-Pieter Katoen (chair, Aachen and Twente), Gerald Lüttgen (Bamberg), Vladimiro Sassone (Southampton), Tarmo Uustalu (Reykjavik and Tallinn), and Lenore Zuck (Chicago). Other members of the SC are: Wil van der Aalst (Aachen), Dirk Beyer (Munich), Mikolaj Bojanczyk (Warsaw), Armin Biere (Linz), Luis Caires (Lisbon), Jordi Cabot (Barcelona), Jean Goubault-Larrecq (Cachan), Jurriaan Hage (Utrecht), Rainer Hähnle (Darmstadt), Reiko Heckel (Leicester), Panagiotis Katsaros (Thessaloniki), Barbara König (Duisburg), Kim G. Larsen (Aalborg), Matteo Maffei (Vienna), Tiziana Margaria (Limerick), Peter Müller (Zurich), Flemming Nielson (Copenhagen), Catuscia Palamidessi (Palaiseau), Dave Parker (Birmingham), Andrew M. Pitts (Cambridge), Dave Sands (Gothenburg), Don Sannella (Edinburgh), Alex Simpson (Ljubljana), Gabriele Taentzer (Marburg), Peter Thiemann (Freiburg), Jan Vitek (Prague), Tomas Vojnar (Brno), Heike Wehrheim (Paderborn), Anton Wijs (Eindhoven), and Lijun Zhang (Beijing).

I would like to take this opportunity to thank all speakers, attendants, organizers of the satellite workshops, and Springer for their support. I hope you all enjoy the proceedings of ETAPS 2019. Finally, a big thanks to Jan and Jan and their local organization team for all their enormous efforts enabling a fantastic ETAPS in Prague!

February 2019 Joost-Pieter Katoen ETAPS SC Chair ETAPS e.V. President

### Preface

TACAS 2019 was the 25th edition of the International Conference on Tools and Algorithms for the Construction and Analysis of Systems conference series. TACAS 2019 was part of the 22nd European Joint Conferences on Theory and Practice of Software (ETAPS 2019). The conference was held at the Orea Hotel Pyramida in Prague, Czech Republic, during April 8–11, 2019.

Conference Description. TACAS is a forum for researchers, developers, and users interested in rigorously based tools and algorithms for the construction and analysis of systems. The conference aims to bridge the gaps between different communities with this common interest and to support them in their quest to improve the utility, reliability, flexibility, and efficiency of tools and algorithms for building systems. TACAS 2019 solicited four types of submissions:


Paper Selection. This year, 164 papers were submitted to TACAS, among which 119 were research papers, 10 case-study papers, 24 regular tool papers, and 11 were tool-demonstration papers. After a rigorous review process, with each paper reviewed by at least three Program Committee members, followed by an online discussion, the Program Committee accepted 29 research papers, 2 case-study papers, 11 regular tool papers, and 8 tool-demonstration papers (50 papers in total).

Artifact-Evaluation Process. The main novelty of TACAS 2019 was that, for the first time, artifact evaluation was compulsory for all regular tool papers and tool demonstration papers. For research papers and case-study papers, artifact evaluation was optional. The artifact evaluation process was organized as follows:

– Regular tool papers and tool demonstration papers. The authors of the 35 submitted papers of these categories of papers were required to submit an artifact alongside their paper submission. Each artifact was evaluated independently by three reviewers. Out of the 35 artifact submissions, 28 were successfully evaluated, which corresponds to an acceptance rate of 80%. The AEC used a two-phase reviewing process: Reviewers first performed an initial check to see whether the artifact was technically usable and whether the accompanying instructions were consistent, followed by a full evaluation of the artifact. The main criterion for artifact acceptance was consistency with the paper, with completeness and documentation being handled in a more lenient manner as long as the artifact was useful overall. The reviewers were instructed to check whether results are consistent with what is described in the paper. Inconsistencies were to be clearly pointed out and explained by the authors. In addition to the textual reviews, reviewers also proposed a numeric value about (potentially weak) acceptance/rejection of the artifact. After the evaluation process, the results of the artifact evaluation were summarized and forwarded to the discussion of the papers, so as to enable the reviewers of the papers to take the evaluation into account. In all but three cases, tool papers whose artifacts did not pass the evaluation were rejected.

– Research papers and case-study papers. For this category of papers, artifact evaluation was voluntary. The authors of each of the 25 accepted papers were invited to submit an artifact immediately after the acceptance notification. Owing to the short time available for the process and acceptance of the artifact not being critical for paper acceptance, there was only one round of evaluation for this category, and every artifact was assigned to two reviewers. The artifacts were evaluated using the same criteria as for tool papers. Out of the 18 submitted artifacts of this phase, 15 were successfully evaluated (83% acceptance rate) and were awarded the TACAS 2019 AEC badge, which is added to the title page of the respective paper if desired by the authors.

TOOLympics. TOOLympics 2019 was part of the celebration of the 25th anniversary of the TACAS conference. The goal of TOOLympics is to acknowledge the achievements of the various competitions in the field of formal methods, and to understand their commonalities and differences. A total of 2<sup>4</sup> competitions joined TOOLympics and were presented at the event. An overview and competition reports of 11 competitions are included in the third volume of the TACAS 2019 proceedings, which are dedicated to the 25th anniversary of TACAS. The extra volume contains a review of the history of TACAS, the TOOLympics papers, and the papers of the annual Competition on Software Verification.

Competition on Software Verification. TACAS 2019 also hosted the 8th International Competition on Software Verification (SV-COMP), chaired and organized by Dirk Beyer. The competition again had high participation: 31 verification systems with developers from 14 countries were submitted for the systematic comparative evaluation, including three submissions from industry. The TACAS proceedings includes the competition report and short papers describing 11 of the participating verification systems. These papers were reviewed by a separate program committee (PC); each of the papers was assessed by four reviewers. Two sessions in the TACAS program (this year as part of the TOOLympics event) were reserved for the presentation of the results: the summary by the SV-COMP chair and the participating tools by the developer teams in the first session, and the open jury meeting in the second session.

Acknowledgments. We would like to thank everyone who helped to make TACAS 2019 successful. In particular, we would like to thank the authors for submitting their papers to TACAS 2019. We would also like to thank all PC members, additional reviewers, as well as all members of the artifact evaluation committee (AEC) for their detailed and informed reviews and, in the case of the PC and AEC members, also for their discussions during the virtual PC and AEC meetings. We also thank the Steering Committee for their advice. Special thanks go to the Organizing Committee of ETAPS 2019 and its general chairs, Jan Kofroň and Jan Vitek, to the chair of the ETAPS 2019 executive board, Joost-Pieter Katoen, and to the publication team at Springer.

March 2019 Tomáš Vojnar (PC Chair) Lijun Zhang (PC Chair) Marius Mikucionis (Tools Chair) Radu Grosu (Use-Case Chair) Dirk Beyer (SV-COMP Chair) Ondřej Lengál (AEC Chair) Ernst Moritz Hahn (AEC Chair)

### Organization

### Program Committee

Dirk Beyer LMU Munich, Germany Yu-Fang Chen Academia Sinica, Taiwan Maria Christakis MPI-SWS, Germany Leonardo de Moura Microsoft Research, USA Falk Howar TU Dortmund, Germany Stefan Kiefer University of Oxford, UK Bernhard Steffen TU Dortmund, Germany Zhendong Su ETH Zurich, Switzerland Meng Sun Peking University, China

Parosh Aziz Abdulla Uppsala University, Sweden Armin Biere Johannes Kepler University Linz, Austria Ahmed Bouajjani IRIF, Paris Diderot University, France Patricia Bouyer LSV, CNRS/ENS Cachan, Université Paris Saclay, France Alessandro Cimatti Fondazione Bruno Kessler, Italy Rance Cleaveland University of Maryland, USA Parasara Sridhar Duggirala University of North Carolina at Chapel Hill, USA Pierre Ganty IMDEA Software Institute, Spain Radu Grosu Vienna University of Technology, Austria Orna Grumberg Technion – Israel Institute of Technology, Israel Klaus Havelund NASA/Caltech Jet Propulsion Laboratory, USA Holger Hermanns Saarland University, Germany Marieke Huisman University of Twente, The Netherlands Radu Iosif Verimag, CNRS/University of Grenoble Alpes, France Joxan Jaffar National University of Singapore, Singapore Jan Kretinsky Technical University of Munich, Germany Salvatore La Torre Università degli studi di Salerno, Italy Kim Guldstrand Larsen Aalborg University, Denmark Anabelle McIver Macquarie University, Australia Roland Meyer TU Braunschweig, Germany Marius Mikučionis Aalborg University, Denmark Sebastian A. Mödersheim Technical University of Denmark, Denmark David Parker University of Birmingham, UK Corina Pasareanu CMU/NASA Ames Research Center, USA Sanjit Seshia University of California, Berkeley, USA Jan Strejcek Masaryk University, Czech Republic


### Program Committee and Jury—SV-COMP



### Artifact Evaluation Committee (AEC)


Tuan Phong Ngo Uppsala, Sweden

Ilina Stoilkovska TU Wien, Austria Pedro Valero IMDEA, Spain Maximilian Weininger TU Munich, Germany

### Additional Reviewers

Aiswarya, C. Albarghouthi, Aws Aminof, Benjamin Américo, Arthur Ashok, Pranav Atig, Mohamed Faouzi Bacci, Giovanni Bainczyk, Alexander Barringer, Howard Basset, Nicolas Bensalem, Saddek Berard, Beatrice Besson, Frédéric Biewer, Sebastian Bogomolov, Sergiy Bollig, Benedikt Bozga, Marius Bozzano, Marco Brazdil, Tomas Caulfield, Benjamin Chaudhuri, Swarat Cheang, Kevin Chechik, Marsha Chen, Yu-Fang Chin, Wei-Ngan Chini, Peter

Le Quang Loc Teesside University, UK Rasool Maghareh National University of Singapore, Singapore Tobias Meggendorfer TU Munich, Germany Malte Mues TU Dortmund, Germany Chris Novakovic University of Birmingham, UK Thai M. Trinh Advanced Digital Sciences Center, Illinois at Singapore, Singapore Wytse Oortwijn University of Twente, The Netherlands Aleš Smrčka Brno University of Technology, Czech Republic Daniel Stan Saarland University, Germany Ming-Hsien Tsai Academia Sinica, Taiwan Jan Tušil Masaryk University, Czech Republic

> Ciardo, Gianfranco Cohen, Liron Cordeiro, Lucas Cyranka, Jacek Čadek, Pavel Darulova, Eva Degorre, Aldric Delbianco, Germán Andrés Delzanno, Giorgio Devir, Nurit Dierl, Simon Dragoi, Cezara Dreossi, Tommaso Dutra, Rafael Eilers, Marco El-Hokayem, Antoine Faella, Marco Fahrenberg, Uli Falcone, Ylies Fox, Gereon Freiberger, Felix Fremont, Daniel Frenkel, Hadar Friedberger, Karlheinz Frohme, Markus Fu, Hongfei

Furbach, Florian Garavel, Hubert Ghosh, Bineet Ghosh, Shromona Gondron, Sebastien Gopinath, Divya Gossen, Frederik Goyal, Manish Graf-Brill, Alexander Griggio, Alberto Gu, Tianxiao Guatto, Adrien Gutiérrez, Elena Hahn, Ernst Moritz Hansen, Mikkel Hartmanns, Arnd Hasani, Ramin Havlena, Vojtěch He, Kangli He, Pinjia Hess, Andreas Viktor Heule, Marijn Ho, Mark Ho, Nhut Minh Holik, Lukas Hsu, Hung-Wei Inverso, Omar Irfan, Ahmed Islam, Md. Ariful Itzhaky, Shachar Jakobs, Marie-Christine Jaksic, Stefan Jasper, Marc Jensen, Peter Gjøl Jonas, Martin Kaminski, Benjamin Lucien Karimi, Abel Katelaan, Jens Kauffman, Sean Kaufmann, Isabella Khoo, Siau-Cheng Kiesl, Benjamin Kim, Eric Klauck, Michaela Kong, Hui Kong, Zhaodan

Kopetzki, Dawid Krishna, Siddharth Krämer, Julia Kukovec, Jure Kumar, Rahul Köpf, Boris Lange, Martin Le Coent, Adrien Lemberger, Thomas Lengal, Ondrej Li, Yi Lin, Hsin-Hung Lluch Lafuente, Alberto Lorber, Florian Lu, Jianchao Lukina, Anna Lång, Magnus Maghareh, Rasool Mahyar, Hamidreza Markey, Nicolas Mathieson, Luke Mauritz, Malte Mayr, Richard Mechtaev, Sergey Meggendorfer, Tobias Micheli, Andrea Michelmore, Rhiannon Monteiro, Pedro T. Mover, Sergio Mu, Chunyan Mues, Malte Muniz, Marco Murano, Aniello Murtovi, Alnis Muskalla, Sebastian Mutluergil, Suha Orhun Neumann, Elisabeth Ngo, Tuan Phong Nickovic, Dejan Nies, Gilles Noller, Yannic Norman, Gethin Nowack, Martin Olmedo, Federico Pani, Thomas Petri, Gustavo

Piazza, Carla Poli, Federico Poulsen, Danny Bøgsted Prabhakar, Pavithra Quang Trung, Ta Ranzato, Francesco Rasmussen, Cameron Ratasich, Denise Ravanbakhsh, Hadi Ray, Rajarshi Reger, Giles Reynolds, Andrew Rigger, Manuel Rodriguez, Cesar Rothenberg, Bat-Chen Roveri, Marco Rydhof Hansen, René Rüthing, Oliver Sadeh, Gal Saivasan, Prakash Sanchez, Cesar Sangnier, Arnaud Schlichtkrull, Anders Schwoon, Stefan Seidl, Martina Shi, Xiaomu Shirmohammadi, Mahsa Shoukry, Yasser Sighireanu, Mihaela Soudjani, Sadegh Spießl, Martin Srba, Jiri

Srivas, Mandayam Stan, Daniel Stoilkovska, Ilina Stojic, Ivan Su, Ting Summers, Alexander J. Tabuada, Paulo Tacchella, Armando Tang, Enyi Tian, Chun Tonetta, Stefano Trinh, Minh-Thai Trtík, Marek Tsai, Ming-Hsien Valero, Pedro van der Berg, Freark Vandin, Andrea Vazquez-Chanlatte, Marcell Viganò, Luca Villadsen, Jørgen Wang, Shuai Wang, Shuling Weininger, Maximilian Wendler, Philipp Wolff, Sebastian Wüstholz, Valentin Xu, Xiao Zeljić, Aleksandar Zhang, Fuyuan Zhang, Qirun Zhang, Xiyue

### Contents – Part II

#### Concurrent and Distributed Systems


xviii Contents – Part II


#### Synthesis



### Contents – Part I

#### SAT and SMT I




#### SAT Solving and Theorem Proving


xxii Contents – Part I


#### Verification and Analysis




# Concurrent and Distributed Systems

# **Checking Deadlock-Freedom of Parametric Component-Based Systems**

Marius Bozga, Radu Iosif(B) , and Joseph Sifakis

Univ. Grenoble Alpes, CNRS, Grenoble INP (Institute of Engineering Univ. Grenoble Alpes), VERIMAG, 38000 Grenoble, France {Marius.Bozga,Radu.Iosif,Joseph.Sifakis}@univ-grenoble-alpes.fr

**Abstract.** We propose an automated method for computing inductive invariants used to proving deadlock freedom of parametric component-based systems. The method generalizes the approach for computing structural trap invariants from bounded to parametric systems with general architectures. It symbolically extracts trap invariants from interaction formulae defining the system architecture. The paper presents the theoretical foundations of the method, including new results for the first order monadic logic and proves its soundness. It also reports on a preliminary experimental evaluation on several textbook examples.

Modern computing systems exhibit dynamic and reconfigurable behavior. To tackle the complexity of such systems, engineers extensively use architectures that enforce, by construction, essential properties, such as fault tolerance or mutual exclusion. Architectures can be viewed as parametric operators that take as arguments instances of components of given types and enforce a characteristic property. For instance, client-server architectures enforce atomicity and resilience of transactions, for any numbers of clients and servers. Similarly, token-ring architectures enforce mutual exclusion between any number of components in the ring.

Parametric verification is an extremely relevant and challenging problem in systems engineering. In contrast to the verification of bounded systems, consisting of a known set of components, there exist no general methods and tools succesfully applied to parametric systems. Verification problems for very simple parametric systems, even with finite-state components, are typically intractable [10,16]. Most work in this area puts emphasis on limitations determined mainly by three criteria (1) the topology of the architecture, (2) the coordination primitives, and (3) the properties to be verified.

The main decidability results reduce parametric verification to the verification of a bounded number of instances of finite state components. Several methods try to determine a cut-off size of the system, i.e. the minimal size for which if a property holds, then it holds for any size, e.g. Suzuki [20], Emerson and Namjoshi [15]. Other methods identify systems with well-structured transition relations, for which symbolic enumeration

The research leading to these results has received funding from the European Union Horizon 2020 research and innovation programme under grant agreement no. 700665 CITADEL (Critical Infrastructure Protection using Adaptive MILS) and no. 730086 ERGO (European Robotic Goal-Oriented Autonomous Controller).

of reachable states is feasible [1] or reduce to known decidable problems, such as reachability in vector addition systems [16]. Typically, these methods apply to systems with global coordination. When theoretical decidability is not of concern, semi-algorithmic techniques such as *regular model checking* [2,17], SMT-based *bounded model checking* [3,14], *abstraction* [8,11] and *automata learning* [13] can be used to deal with more general classes of The interested reader can find a complete survey on parameterized model checking by Bloem et al. [10].

This paper takes a different angle of attack to the verification problem, seeking generality of the type of parametric systems and focusing on the verification of a particular but essential property: *deadlock-freedom*. The aim is to come up with effective methods for checking deadlock-freedom, by overcoming the complexity blowup stemming from the effective generation of reachability sets. We briefly describe our approach below.

A system is the composition of a finite number of component instances of given types, using interactions that follow the Behaviour-Interaction-Priorities (BIP) paradigm [7]. To simplify the technical part, we assume that components and interactions are finite abstractions of real-life systems. An instance is a finite-state transition system whose edges are labeled by ports. The instances communicate synchronously via a number of simultaneous interactions involving a set of ports each, such that no data is exchanged during interactions. If the number of instances in the system is fixed and known in advance, we say that the system is *bounded*, otherwise it is *parametric*.

**Fig. 1.** Mutual exclusion example

For instance, the bounded system in Fig. 1a consist of component types *Semaphore*, with one instance, and *Task*, with two instances. A semaphore goes from the free state *r* to the taken state *s* by an acquire action *a*, and viceversa from *s* to *r* by a release action *e*. A task goes from waiting *w* to busy *u* by action *b* and viceversa, by action *f* . For the bounded system in Fig. 1a, the interactions are {*a*, *b*1}, {*a*, *b*2}, {*e*, *f*1} and {*e*, *f*2}, depicted with dashed lines. Since the number of instances is known in advance, we can view an interaction as a minimal satisfying valuation of the boolean formula Γ = (*a*∧*b*1)∨(*a*∧*b*2)∨(*e*∧ *f*1)∨(*e*∧ *f*2), where the port symbols are propositional variables. Because every instance has finitely many states, we can write a boolean formula Δ = [¬*r* ∨ ¬(*w*<sup>1</sup> ∨ *w*2)] ∧ [¬*s* ∨ ¬(*u*<sup>1</sup> ∨ *u*2)], this time over propositional state variables, which defines the configurations in which all interactions are disabled (deadlock). Proving that no deadlock configuration is reachable from the initial configuration *r*∧*w*1∧*w*2, requires finding an over-approximation (invariant) *I* of the reachable configurations, such that the conjunction *I* ∧ Δ is not satisfiable.

The basic idea of our method, supported by the D-Finder deadlock detection tool [9] for bounded component-based systems, is to compute an invariant straight from the interaction formula, without going through costly abstract fixpoint iterations. The invariants we are looking for are in fact solutions of a system of boolean constraints Θ(Γ), of size linear in the size of Γ (written in DNF). In our example, Θ(Γ) = - *<sup>i</sup>*=1,2(*r* ∨ *wi*) ↔ (*s* ∨ *ui*). Finding the (minimal) solutions of this constraint can be done, as currently implemented in D-Finder, by exhaustive model enumeration using a SAT solver. Here we propose a more efficient solution, which consists in writing Θ(Γ) in DNF and remove the negative literals from each minterm. In our case, this gives the invariant *I* = (*r* ∨ *s*) ∧ - *<sup>i</sup>*=1,2(*wi* ∨ *ui*) ∧ (*r* ∨ *u*<sup>1</sup> ∨ *u*2) ∧ (*s* ∨ *w*<sup>1</sup> ∨ *w*2) and *I* ∧ Δ is proved unsatisfiable using a SAT solver.

The main contribution of this paper is the generalization of this invariant generation method to the parametric case. To understand the problem, consider the parametric system from Fig. 1, in which a *Semaphore* interacts with *n Tasks*, where *n* > 0 is not known in advance. The interactions are described by a fragment of first order logic, in which the ports are either propositional or monadic predicate symbols, in our case Γ = *a*∧∃*i* . *b*(*i*)∨*e*∧∃*i* . *f*(*i*). This logic, called *Monadic Interaction Logic* (MIL), is also used to express the constraints Θ(Γ) and compute their solutions. In our case, we obtain *I* = (*r*∨*s*)∧[∀*i* . *w*(*i*)∨*u*(*i*)]∧[*r*∨∃*i* . *u*(*i*)]∧[*s*∨∃*i* . *w*(*i*)]. As in the bounded case, we can give a parametric description of deadlock configurations Δ = [¬*r* ∨ ¬∃*i* . *w*(*i*)] ∧ [¬*s* ∨ ¬∃*i* . *u*(*i*)] and prove that *I* ∧ Δ is unsatisfiable, using the decidability of MIL, based on an early small model property result due to Lowenheim [ ¨ 19]. In practice, we avoid the model enumeration suggested by this result and check the satisfiability of such queries using a decidable theory of sets with cardinality constraints [18], available in the CVC4 SMT solver [4].

The paper is structured as follows: Sect. 1 presents existing results for checking deadlock-freedom of bounded systems using invariants, Sect. 2 formalizes the approach for computing invariants using MIL, Sect. 3 introduces cardinality constraints for invariant generation, Sect. 4 presents the integration of the above results within a verification technique for parametric systems and Sect. 5 reports on preliminary experiments carried out with a prototype tool. Finally, Sect. 6 presents concluding remarks and future work directions. For reasons of space, all proofs are given in [12].

#### **1 Bounded Component-Based Systems**

A *component* is a tuple C = P, S, *s*0, Δ , where P = {*p*, *q*,*r*,...} is a finite set of *ports*, S is a finite set of *states*, *s*<sup>0</sup> ∈ S is an initial state and Δ ⊆ S × P × S is a set of *transitions* written *s <sup>p</sup>* →− *s* . To simplify the technical details, we assume there are *no two di*ff*erent transitions with the same port*, i.e. if *s*<sup>1</sup> *p*1 −→ *s* <sup>1</sup>, *s*<sup>2</sup> *p*2 −→ *s* <sup>2</sup> ∈ Δ and *s*<sup>1</sup> *s*<sup>2</sup> or *s* <sup>1</sup> *s* <sup>2</sup> then *p*<sup>1</sup> *p*2. In general, this restriction can be lifted, at the cost of cluttering the presentation.

A *bounded system* S = C<sup>1</sup> ,..., C*<sup>n</sup>* , Γ consists of a fixed number (*n*) of components C*<sup>k</sup>* = P*<sup>k</sup>* , S*<sup>k</sup>* , *s*<sup>0</sup> *k* , Δ*<sup>k</sup>* and an *interaction formula* Γ, describing the allowed interactions. Since the number of components is known in advance, we write interaction formulae using boolean logic over the set of propositional variables BVar def = *<sup>n</sup> <sup>k</sup>*=1(P*<sup>k</sup>* ∪ S*<sup>k</sup>* ). Here we intentionally use the names of states and ports as propositional variables.

A *boolean interaction formula* is either *a* ∈ BVar, *f*<sup>1</sup> ∧ *f*<sup>2</sup> or ¬*f*1, where *fi* are formulae, for *i* = 1, 2, respectively. We define the usual shorthands *f*<sup>1</sup> ∨ *f*<sup>2</sup> def = ¬(¬*f*<sup>1</sup> ∧ ¬*f*2), *f*<sup>1</sup> → *f*<sup>2</sup> def = ¬*f*<sup>1</sup> ∨ *f*2, *f*<sup>1</sup> ↔ *f*<sup>2</sup> def = (*f*<sup>1</sup> → *f*2) ∧ (*f*<sup>2</sup> → *f*1). A literal is either a variable or its negation and a minterm is a conjunction of literals. A formula is in disjunctive normal form (DNF) if it is written as *<sup>n</sup> i*=1 *mi <sup>j</sup>*=<sup>1</sup> *i j*, where *i j* is a literal. A formula is *positive* if and only if each variable occurs under an even number of negations, or, equivalently, its DNF forms contains no negative literals. We assume interaction formulae of bounded systems to be always positive.

A *Boolean Valuation* β : BVar → {, ⊥} maps each propositional variable to either true () or false (⊥). We write β |= *f* if and only if *f* = , when replacing each boolean variable *a* with β(*a*) in *f* . We say that β is a *model* of *f* in this case and write *f* ≡ *g* for [[*f*]] = [[*g*]], where [[*f*]] def = {β | β |= *f*}. Given two valuations β<sup>1</sup> and β<sup>2</sup> we write β<sup>1</sup> ⊆ β<sup>2</sup> if and only if β1(*a*) = implies β2(*a*) = , for each variable *a* ∈ BVar. We write *<sup>f</sup>* <sup>≡</sup><sup>μ</sup> *<sup>g</sup>* for [[*f*]]<sup>μ</sup> <sup>=</sup> [[*g*]]<sup>μ</sup>, where [[*f*]]<sup>μ</sup> def = {β ∈ [[*f*]] | for all β : β ⊆ β and β - β only if β [[*f*]]} is the set of minimal models of *f* .

#### **1.1 Execution Semantics of Bounded Systems**

We use 1-safe marked Petri Nets to define the set of executions of a bounded system. A *Petri Net* (PN) is a tuple *N* = *S*, *T*, *E* , where *S* is a set of *places*, *T* is a set of *transitions*, *S* ∩ *T* = ∅, and *E* ⊆ *S* × *T* ∪ *T* × *S* is a set of *edges*. The elements of *S* ∪ *T* are called *nodes*. For a node *n*, let •*n* def <sup>=</sup> {*<sup>m</sup>* <sup>∈</sup> *<sup>S</sup>* <sup>∪</sup> *<sup>T</sup>* <sup>|</sup> *<sup>E</sup>*(*m*, *<sup>n</sup>*) <sup>=</sup> <sup>1</sup>}, *<sup>n</sup>*• def = {*m* ∈ *S* ∪ *T* | *E*(*n*, *m*) = 1} and lift these definitions to sets of nodes, as usual.

A *marking* for a PN *N* = *S*, *T*, *E* is a function m : *<sup>S</sup>* <sup>→</sup> <sup>N</sup>. A *marked Petri net* is a pair N = (*N*, m0), where m0 is the *initial marking* of *N* = *S*, *T*, *E* . We consider that the reader is familiar with the standard execution semantics of a marked PN. A marking m is *reachable* in N if and only if there exists a sequence of transitions leading from m0 to m. We denote by R(N) the set of reachable markings of N. A set of markings M is an *invariant* of N = (*N*, m0) if and only if m0 ∈ M and M is closed under the transitions of *N*. A marked PN N is 1*-safe* if m(*s*) ≤ 1, for each *s* ∈ *S* and each

**Fig. 2.** PN for mutual exclusion

m ∈ R(N). In the following, we consider only marked PNs that are 1-safe. In this case, any (necessarily finite) set of reachable markings can be defined by a boolean formula, which identifies markings with the induced boolean valuations. A marking m is a *deadlock* if for no transition is enabled in m and let D(N) be the set of deadlocks of *N*. A marked PN N is *deadlock-free* if and only if R(N) ∩ D(N) = ∅. A sufficient condition for deadlock freedom is M∩D(N) = ∅, for some invariant M of N.

In the rest of this section, we fix a bounded system S = C<sup>1</sup> ,..., C*<sup>n</sup>* , Γ , where C*<sup>k</sup>* = P*k* , S*<sup>k</sup>* , *s*<sup>0</sup> *k* , Δ*<sup>k</sup>* , for all *k* ∈ [1, *n*] and Γ is a positive boolean formula, over propositional variables denoting ports. The set of executions of S is given by the 1-safe marked PN NS = (*N*, m0), where *N* = ( *n <sup>i</sup>*=<sup>1</sup> S*<sup>i</sup>* , *T*, *E*), m0(*s*) = 1 if and only if *s* ∈ {*s*<sup>0</sup> *<sup>i</sup>* | *i* ∈ [1, *n*]} and *<sup>T</sup>*, *<sup>E</sup>* are as follows. For each minimal model <sup>β</sup> <sup>∈</sup> [[Γ]]μ, we have a transition <sup>t</sup><sup>β</sup> <sup>∈</sup> *<sup>T</sup>* and edges (*si*, tβ), (tβ, *s i* ) ∈ *E*, for all *i* ∈ [1, *n*] such that *si pi* −→ *s <sup>i</sup>* ∈ Δ*<sup>i</sup>* and β(*pi*) = . Moreover, nothing else is in *T* or *E*.

For example, the marked PN from Fig. 2 describes the set of executions of the bounded system from Fig. 1a. Note that each transition of the PN corresponds to a minimal model of the interaction formula Γ = *a* ∧ *b*<sup>1</sup> ∨ *a* ∧ *b*<sup>2</sup> ∨ *e* ∧ *f*<sup>1</sup> ∨ *e* ∧ *f*2, or equivalently, to the set of (necessarily positive) literals of some minterm in the DNF of Γ.

#### **1.2 Proving Deadlock Freedom of Bounded Systems**

A bounded system S is deadlock-free if and only if its corresponding marked PN NS is deadlock-free. In the following, we prove deadlock-freedom of a bounded system, by defining a class of invariants that are particularly useful for excluding unreachable deadlock markings.

Given a Petri Net *N* = (*S*, *T*, *E*), a set of places *W* ⊆ *S* is called a *trap* if and only if *W*• ⊆ •*W*. A trap *W* of *N* is a *marked trap* of the marked PN N = (*N*, m0) if and only if m0(*s*) = for some *s* ∈ *W*. A *minimal marked trap* is a marked trap such that none of its strict subsets is a marked trap. A marked trap defines an invariant of the PN because some place in the trap will always be marked, no matter which transition is fired. The *trap invariant* of N is the least set of markings that mark each trap of N. Clearly, the trap invariant of N subsumes the set of reachable markings of N, because the latter is the least invariant of N and invariants are closed under intersection1.

**Lemma 1.** *Given a bounded system* S*, the boolean formula:*

$$\operatorname{Trap}(\mathcal{N}\_{\mathcal{S}}) \stackrel{\text{def}}{=} \bigwedge \{ \bigvee\_{i=1}^{k} \operatorname{s}\_{i} \mid \{ \operatorname{s}\_{1}, \dots, \operatorname{s}\_{k} \} \text{ is a marked trap of } \mathcal{N}\_{\mathcal{S}} \} $$

*defines an invariant of* NS*.*

Next, we describe a method of computing trap invariants that does not explicitly enumerate all the marked traps of a marked PN. First, we consider a *trap constraint* Θ(Γ), derived from the interaction formula Γ, in linear time. By slight abuse of notation, we define, for a given port *p* ∈ P*<sup>i</sup>* of the component C*<sup>i</sup>* , for some *i* ∈ [1, *n*], the pre- and post-state of *<sup>p</sup>* in <sup>C</sup>*<sup>i</sup>* as •*<sup>p</sup>* def = *s* and *p*• def = *s* , where *s <sup>p</sup>* →− *s* is the unique rule2 involving

<sup>1</sup> The intersection of two or more invariants is again an invariant.

<sup>2</sup> We have assumed that each port is associated a unique transition rule.

*p* in Δ*<sup>i</sup>* , and •*p* = *p*• def = ⊥ if there is no such rule. Assuming that the interaction formula is written in DNF as Γ = *<sup>N</sup> k*=1 -*Mk* =<sup>1</sup> *pk*, we define the trap constraint:

$$\Theta(\varGamma) \stackrel{\scriptstyle \mathfrak{a} \prime}{=} \bigwedge\_{k=1}^{N} \left( \bigvee\_{\ell=1}^{M\_k} \blackcap\_{p\_{k\ell}} p\_{k\ell} \right) \to \left( \bigvee\_{\ell=1}^{M\_k} p\_{k\ell} \bullet \right).$$

It is not hard to show3 that any satisfying valuation of Θ(Γ) defines a trap of NS and, moreover, any such trap is defined in this way. We also consider the formula *Init*(S) def = *n <sup>k</sup>*=<sup>1</sup> *s*<sup>0</sup> *<sup>k</sup>* defining the set of initially marked places of S, and prove the following:

**Lemma 2.** *Let* S *be a bounded system with interaction formula* Γ *and* β *be a boolean valuation. Then* β ∈ [[Θ(Γ)∧*Init*(S)]] *i*ff {*s* | β(*s*) = } *is a marked trap of* NS*. Moreover,* <sup>β</sup> <sup>∈</sup> [[Θ(Γ) <sup>∧</sup> *Init*(S)]]<sup>μ</sup> *<sup>i</sup>*ff {*<sup>s</sup>* <sup>|</sup> <sup>β</sup>(*s*) <sup>=</sup> } *is a minimal marked trap of* NS*.*

Because Θ(Γ) and *Init*(S) are boolean formulae, it is, in principle, possible to compute the trap invariant *Trap*(NS) by enumerating the (minimal) models of Θ(Γ)∧*Init*(S) and applying the definition from Lemma 1. However, model enumeration is inefficient and, moreover, does not admit generalization for the parametric case, in which the size of the system is unknown. For these reasons, we prefer a computation of the trap invariant, based on two symbolic transformations of boolean formulae, described next.

For a formula *f* we denote by *f* <sup>+</sup> the positive formula obtained by deleting all negative literals from the DNF of *f* . We shall call this operation *positivation*. Second, for a positive boolean formula *f* , we define the *dual* formula (*f*) <sup>∼</sup> recursively on the structure of *f* , as follows: (*f*<sup>1</sup> ∧ *f*2) ∼ def = *f*<sup>1</sup> <sup>∼</sup> ∨ *f*<sup>2</sup> <sup>∼</sup>, (*f*<sup>1</sup> ∨ *f*2) ∼ def = *f*<sup>1</sup> <sup>∼</sup> ∧ *f*<sup>2</sup> <sup>∼</sup> and *a*<sup>∼</sup> def = *a*, for any *a* ∈ BVar. Note that *f* <sup>∼</sup> is equivalent to the negation of the formula obtained from *f* by substituting each variable *a* with ¬*a* in *f* .

The following theorem gives the main result of this section, the symbolic computation of the trap invariant of a bounded system, directly from its interaction formula.

**Theorem 1.** *For any bounded system* S*, with interaction formula* Γ*, we have:*

*Trap*(NS) ≡ [Θ(Γ) ∧ *Init*(S)] <sup>+</sup><sup>∼</sup>

Intuitively, any satisfying valuation of Θ(Γ)∧*Init*(S) defines an initially marked trap of NS and a minimal such valuation defines a minimal such trap (Lemma 2). Instead of computing the minimal satisfying valuations by model enumeration, we directly cast the above formula in DNF and remove the negative literals. This is essentially because the negative literals do not occur in the propositional definition of a set of places4. Then the dualization of this positive formula yields the trap invariants in CNF, as a conjunction over disjunctions of propositional variables corresponding to the places inside a minimal initially marked trap.

Just as any invariants, trap invariants can be used to prove absence of deadlocks in a bounded system. Assuming, as before, that the interaction formula is given in DNF

<sup>3</sup> See [5] for a proof.

<sup>4</sup> If the DNF is (*p* ∧ *q*) ∨ (*p* ∧ ¬*r*), the dualization would give (*p* ∨ *q*) ∧ (*p* ∨ ¬*r*). The first clause corresponds to the trap {*p*, *q*} (either *p* or *q* is marked), but the second does not directly define a trap. However, by first removing the negative literals, we obtain the traps {*p*, *q*} and {*r*}.

as Γ = *<sup>N</sup> k*=1 -*Mk* =<sup>1</sup> *pk*, we define the set of deadlock markings of NS by the formula Δ(Γ) def = -*N k*=1 *Mk* =<sup>1</sup> ¬( • *pk*). This is the set of configurations in which all interactions are disabled. With this definition, proving deadlock freedom amounts to proving unsatisfiability of a boolean formula.

**Corollary 1.** *A bounded system* S *with interaction formula* Γ *is deadlock-free if the boolean formula* [Θ(Γ) <sup>∧</sup> *Init*(S)]<sup>+</sup><sup>∼</sup> <sup>∧</sup> <sup>Δ</sup>(Γ) *is unsatisfiable.*

### **2 Parametric Component-Based Systems**

From now on we shall focus on parametric systems, consisting of a fixed set of component types C<sup>1</sup> ,..., C*<sup>n</sup>* , such that the number of instances of each type is not known in advance. These numbers are given by a function <sup>M</sup> : [1, *<sup>n</sup>*] <sup>→</sup> <sup>N</sup>, where <sup>M</sup>(*k*) denotes the number of components of type C*<sup>k</sup>* that are active in the system. To simplify the technical presentation of the results, we assume that all instances of a component type are created at once, before the system is started5. For the rest of this section, we fix a parametric system S = C<sup>1</sup> ,..., C*<sup>n</sup>* , M, Γ , where each component type C*<sup>k</sup>* = P*<sup>k</sup>* , S*<sup>k</sup>* , *s*<sup>0</sup> *k* , Δ*<sup>k</sup>* has the same definition as a component in a bounded system and Γ is an interaction formula, written in the fragment of first order logic, defined next.

#### **2.1 Monadic Interaction Logic**

For each component type C*<sup>k</sup>* , where *k* ∈ [1, *n*], we assume a set of index variables Var*<sup>k</sup>* and a set of predicate symbols Pred*<sup>k</sup>* def = P*<sup>k</sup>* ∪ S*<sup>k</sup>* . Similar to the bounded case, we use state and ports names as monadic (unary) predicate symbols. We also define the sets Var def = *<sup>n</sup> <sup>k</sup>*=<sup>1</sup> Var*<sup>k</sup>* and Pred def = *<sup>n</sup> <sup>k</sup>*=<sup>1</sup> Pred*<sup>k</sup>* . Moreover, we consider that Var*<sup>k</sup>* ∩ Var = ∅ and Pred*<sup>k</sup>* ∩ Pred = ∅, for all 1 ≤ *k* < ≤ *n*. For simplicity's sake, we assume that all predicate symbols in Pred are of arity one. For component types C*<sup>k</sup>* , such that M(*k*) = 1 and predicate symbols pr ∈ Pred*<sup>k</sup>* , we shall write pr instead of pr(1), as in the interaction formula of the system from Fig. 1b. The syntax of the *monadic interaction logic* (MIL) is given below:

$$\begin{array}{l} i, j \in \mathsf{Valx} \text{ index variables} \\ \phi := i = j \mid \mathsf{pr}(i) \mid \phi\_1 \land \phi\_2 \mid \neg \phi\_1 \mid \exists i \; . \phi\_1 \end{array}$$

where, for each predicate atom pr(*i*), if pr ∈ Pred*<sup>k</sup>* and *i* ∈ Var then *k* = . We use the shorthands ∀*i* . φ<sup>1</sup> def <sup>=</sup> <sup>¬</sup>(∃*<sup>i</sup>* . <sup>¬</sup>φ1) and distinct(*i*1,..., *im*) def = - <sup>1</sup>≤*j*<≤*<sup>m</sup>* ¬*ij* = *i* 6. A *sentence* is a formula in which all variables are in the scope of a quantifier. A formula is *positive* if each predicate symbol occurs under an even number of negations. The semantics of MIL is given in terms of structures I = (U, ν, ι), where:

– U def = [1, max*<sup>n</sup> <sup>k</sup>*=<sup>1</sup> M(*k*)] is the *universe* of instances, over which variables range,

<sup>5</sup> This is not a limitation, since dynamic instance creation can be simulated by considering that all instances are initially in a waiting state, which is left as result of an interaction involving a designated "spawn" port.

<sup>6</sup> Throughout this paper, we consider that - *<sup>i</sup>*∈*<sup>I</sup>* φ*<sup>i</sup>* = if *I* = ∅.


For a structure I = (U, ν, ι) and a formula φ, the satisfaction relation I |= φ is defined as:

I |= ⊥ ⇔ never I |= *i* = *j* ⇔ ν(*i*) = ν(*j*) I |= *p*(*i*) ⇔ ν(*i*) ∈ ι(*p*) I |= ∃*i* . φ<sup>1</sup> ⇔ (U, ν[*i* ← *m*], ι) |= φ<sup>1</sup> for some *m* ∈ [1, M(*k*)] provided that *i* ∈ Var*<sup>k</sup>*

where ν[*i* ← *m*] is the valuation that acts as ν, except for *i*, which is assigned to *m*. Whenever I |= φ, we say that I is a *model* of φ. It is known that, if a MIL formula has a model, then it has a model with universe of cardinality at most exponential in the size (number of symbols) of the formula [19]. This result, due to Lowenheim, is among the ¨ first decidability results for a fragment of first order logic.

Structures are partially ordered by pointwise inclusion, i.e. for I*<sup>i</sup>* = (U, ν*i*, ι*i*), for *i* = 1, 2, we write I<sup>1</sup> ⊆ I<sup>2</sup> iff ι1(*p*) ⊆ ι2(*p*), for all *p* ∈ Pred and I<sup>1</sup> ⊂ I<sup>2</sup> iff I<sup>1</sup> ⊆ I<sup>2</sup> and I<sup>1</sup> - <sup>I</sup>2. As before, we define the sets [[φ]] <sup>=</sup> {I | I |<sup>=</sup> <sup>φ</sup>} and [[φ]]<sup>μ</sup> <sup>=</sup> {I ∈ [[φ]] | ∀I . I ⊂I→I [[φ]]} of models and minimal models of a MIL formula, respectively. Given formulae <sup>φ</sup><sup>1</sup> and <sup>φ</sup>2, we write <sup>φ</sup><sup>1</sup> <sup>≡</sup> <sup>φ</sup><sup>2</sup> for [[φ1]] <sup>=</sup> [[φ2]] and <sup>φ</sup><sup>1</sup> <sup>≡</sup><sup>μ</sup> <sup>φ</sup><sup>2</sup> for [[φ1]]<sup>μ</sup> = [[φ2]]<sup>μ</sup>.

#### **2.2 Execution Semantics of Parametric Systems**

We consider the interaction formulae of parametric systems to be finite disjunctions of formulae of the form below:

$$\exists i\_1 \dots \exists i\_\ell \land \varphi \land \bigwedge\_{j=1}^\ell p\_j(i\_j) \land \bigwedge\_{j=\ell+1}^{\ell+m} \forall i\_j \dots \psi\_j \to p\_j(i\_j) \tag{1}$$

where ϕ, ψ+<sup>1</sup>,...,ψ+*<sup>m</sup>* are conjunctions of equalities and disequalities involving index variables. Intuitively, the formulae (1) state that there are at most component instances that engage in a multiparty rendez-vous interaction on ports *p*1(*i*1),..., *p*(*i*), together with a broadcast to the ports *p*+1(*i*+1),..., *p*+*<sup>m</sup>*(*i*+*<sup>m</sup>*) of the instances that fulfill the constraints ψ+<sup>1</sup>,...,ψ+*<sup>m</sup>*. Observe that, if *m* = 0, the above formula corresponds to a multiparty (generalized) rendez-vous interaction ∃*i*<sup>1</sup> ... ∃*i* ∧ϕ∧ - *<sup>j</sup>*=<sup>1</sup> *pj*(*ij*). An example of peer-to-peer rendez-vous is the parametric system from Fig. 1. Another example of broadcast is given below.

*Example 1.* Consider the parametric system obtained from an arbitrary number of *Worker* components (Fig. 3), where C<sup>1</sup> = *Worker*, Var<sup>1</sup> = {*i*, *i*1, *i*2, *j*} and Pred<sup>1</sup> = {*a*, *b*, *f*, *u*,*w*}. Any pair of instances can jointly execute the *b* (*begin*) action provided *all* others are taking the *a* (*await*) action. Any instance can also execute alone the *f* (*finish*) action.

The execution semantics of a parametric system S is the marked PN NS = (*N*, m0), where *N* = ( *n <sup>k</sup>*=<sup>1</sup> S*<sup>k</sup>* ×[1, M(*k*)], *T*, *E*), m0((*s*<sup>0</sup> *k* , *i*)) = 1, for all *k* ∈ [1, *n*] and *i* ∈ [1, M(*k*)], and the sets of transitions *T* and edges *E* are defined next. For each minimal model I = (U, ν, ι) <sup>∈</sup> [[Γ]]<sup>μ</sup>, we have a transition <sup>t</sup><sup>I</sup> <sup>∈</sup> *<sup>T</sup>* and the edges ((*si*, *<sup>k</sup>*), <sup>t</sup>I), (tI, (*<sup>s</sup> i* , *k*)) ∈ *E*

**Fig. 3.** Parametric system with broadcast

for all *i* ∈ [1, *n*] such that *si pi* −→ *s <sup>i</sup>* ∈ Δ*<sup>i</sup>* and *k* ∈ ι(*pi*). Moreover, nothing else is in *T* or *E*.

As a remark, unlike in the case of bounded systems, the size of the marked PN NS, that describes the execution semantics of a parametric system S, depends on the maximum number of instances of each component type. The definition of the trap invariant *Trap*(NS) is the same as in the bounded case, except that, in this case, the size of the boolean formula depends on the (unbounded) number of instances in the system. The challenge, addressed in the following, is to define trap invariants using MIL formulae of a fixed size.

#### **2.3 Computing Parametric Trap Invariants**

To start with, we define the trap constraint of an interaction formula Γ consisting of a finite disjunction of (1) formulae, as a finite conjunction of formulae of the form below:

$$\begin{aligned} \left[ \forall i\_1 \dots \forall i\_\ell \ . \left[ \varphi \wedge \left( \bigvee\_{j=1}^\ell \blackdot p\_j(i\_j) \vee \bigvee\_{j=\ell+1}^{\ell+m} \exists i\_j \ . \psi\_j \wedge \blackdot p\_j(i\_j) \right) \right] \rightarrow \\ \left[ \bigvee\_{j=1}^\ell p\_j \blackdot (i\_j) \vee \bigvee\_{j=\ell+1}^{\ell+m} \exists i\_j \ . \psi\_j \wedge p\_j \uparrow (i\_j) \right] \end{aligned}$$

where, for a port *p* ∈ P*<sup>k</sup>* of some component type C*<sup>k</sup>* , •*p*(*i*) and *p*(*i*) • denote the unique predicate atoms *s*(*i*) and *s* (*i*), such that *s <sup>p</sup>* →− *s* ∈ Δ*<sup>k</sup>* is the (unique) transition involving *p* in *Tk* , or ⊥ if there is no such rule.

*Example 2.* For example, the trap constraint for the parametric (rendez-vous) system in Fig. 1b is ∀*i*.[*r* ∨ *w*(*i*)] → [*s* ∨ *u*(*i*)] ∧ ∀*i*.[*s* ∨ *u*(*i*)] → [*r* ∨ *u*(*i*)]. Analogously, the trap constraint for the parametric (broadcast) system in Fig. 3 is:

$$\begin{aligned} \forall i\_1. \forall i\_2. \left[ i\_1 \neq i\_2 \land (\bowtie i\_1) \lor \bowtie i\_2 \right] &\lor \exists j. (j \neq i\_1 \land j \neq i\_2 \land \bowtie (j)) \right] \to \\\ \left[ i\_1 \neq i\_2 \land (\mu(i\_1) \lor \mu(i\_2) \lor \exists j. (j \neq i\_1 \land j \neq i\_2 \land \bowtie (j))) \right] \\\ \land \forall i. \,\mu(i) \to \bowtie \nu(i) \end{aligned}$$

We define a translation of MIL formulae into boolean formulae of unbounded size. Given a function <sup>M</sup> : [1, *<sup>n</sup>*] <sup>→</sup> <sup>N</sup>, the *unfolding* of a MIL sentence <sup>φ</sup> is the boolean formula B<sup>M</sup> (φ) obtained by replacing each existential [universal] quantifier ∃*i* . ψ(*i*) [∀*i* . ψ(*i*)], for *i* ∈ Var*<sup>k</sup>* , by a finite disjunction [conjunction] <sup>M</sup>(*k*) =<sup>1</sup> ψ[/*i*] [-M(*k*) =<sup>1</sup> ψ[/*i*]], where the substitution of the constant ∈ M(*k*) for the variable *i* is defined recursively as usual, except for pr(*i*)[/*i*] def = (pr, ), which is a propositional variable. Further, we relate structures to boolean valuations of unbounded sizes. For a structure I = (U, ν, ι) we define the boolean valuation βI((pr, )) = if and only if ∈ ι(pr), for each predicate symbol pr and each integer constant . Conversely, for each valuation β of the propositional variables (pr, ), there exists a structure I<sup>β</sup> = (U, ν, ι) such that ι(pr) def = { | β((pr, )) = }, for each pr ∈ Pred. The following lemma relates the semantics of MIL formulae with that of their boolean unfoldings:

**Lemma 3.** *Given a* MIL *sentence* <sup>φ</sup> *and a function* <sup>M</sup> : [1, *<sup>n</sup>*] <sup>→</sup> <sup>N</sup>*, the following hold:*


Considering the MIL formula *Init*(S) def = *<sup>n</sup> <sup>k</sup>*=<sup>1</sup> ∃*ik* . *s*<sup>0</sup> *k* (*ik*), that defines the set of initial configurations of a parametric system S, the following lemma formalizes the intuition behind the definition of parametric trap constraints:

**Lemma 4.** *Let* S *be a parametric system with interaction formula* Γ *and* I *be a structure. Then* I |= Θ(Γ) ∧ *Init*(S) *i*ff {(*s*, *k*) | *k* ∈ ι(*s*)} *is a marked trap of* NS*. Moreover,* I ∈ [[Θ(Γ) <sup>∧</sup> *Init*(S)]]<sup>μ</sup> *<sup>i</sup>*ff {(*s*, *<sup>k</sup>*) <sup>|</sup> *<sup>k</sup>* <sup>∈</sup> <sup>ι</sup>(*s*)} *is a minimal marked trap of* NS*.*

We are currently left with the task of computing a MIL formula which defines the trap invariant *Trap*(NS) of a parametric component-based system S = C1 ,..., C*<sup>n</sup>* , M, Γ . The difficulty lies in the fact that the size of NS and thus, that of the boolean formula *Trap*(NS) depends on the number M(*k*) of instances of each component type *k* ∈ [1, *n*]. As we aim at computing an invariant able to prove safety properties, such as deadlock freedom, independently of how many components are present in the system, we must define the trap invariant using a formula depending exclusively on Γ, i.e. not on M.

Observe first that *Trap*(NS) can be equivalently defined using only the minimal marked traps of NS, which, by Lemma 4, are exactly the sets {(*s*, *k*) | *k* ∈ ι(*s*)}, defined by some structure (U, ν, ι) <sup>∈</sup> [[Θ(Γ) <sup>∧</sup> *Init*(S)]]<sup>μ</sup>. Assuming that the set of structures [[Θ(Γ) <sup>∧</sup> *Init*(S)]]<sup>μ</sup>, or an over-approximation of it, can be defined by a positive MIL formula, the trap invariant is defined using a generalization of boolean dualisation to predicate logic, defined recursively, as follows:

$$\begin{array}{ccccccccc}(i=j)^{\smile\mathfrak{d}t} \stackrel{\scriptstyle \mathsf{ad}t}{=} \neg i = j & (\phi\_1 \lor \phi\_2)^{\sim} \stackrel{\scriptstyle \mathsf{ad}t}{=} \phi\_1^{\sim} \wedge \phi\_2^{\sim} & (\exists i \,\, . \phi\_1)^{\sim} \stackrel{\scriptstyle \mathsf{ad}t}{=} \forall i \,\, . \phi\_1 \tilde{\ } & p(i)^{\sim} \stackrel{\scriptstyle \mathsf{ad}t}{=} p(i) \\\ (\neg i = j)^{\sim} \stackrel{\scriptstyle \mathsf{ad}t}{=} i = j & (\phi\_1 \land \phi\_2)^{\sim} \stackrel{\scriptstyle \mathsf{ad}t}{=} \phi\_1^{\sim} \vee \phi\_2^{\sim} & (\forall i \,\, . \phi\_1)^{\sim} \stackrel{\scriptstyle \mathsf{ad}t}{=} \exists i \,\, . \phi\_1 \tilde{\ } \end{array}$$

The crux of the method is the ability of defining, given an arbitrary MIL formula φ, a positive MIL formula <sup>φ</sup><sup>⊕</sup> that preserve its minimal models, formally <sup>φ</sup> <sup>≡</sup><sup>μ</sup> <sup>φ</sup>⊕. Because of quantification over unbounded domains, a MIL formula φ does not have a disjunctive normal form and thus one cannot define φ<sup>⊕</sup> by simply deleting the negative literals in DNF, as was done for the definition of the positivation operation (.) <sup>+</sup>, in the propositional case. For now we assume that the transformation (.) <sup>⊕</sup> of monadic predicate formulae into positive formulae preserving minimal models is defined (a detailed presentation of this step is given next in Sect. 3) and close this section with a parametric counterpart of Theorem 1.

**Theorem 2.** *For any parametric system* S = C<sup>1</sup> ,..., C*<sup>n</sup>* , M, Γ *, we have*

$$\operatorname{Trap}(\mathcal{N}\_{\mathcal{S}}) \equiv \operatorname{B}\_{\mathsf{M}}\left(\left(\left(\Theta(\varGamma) \land \operatorname{Mit}(\mathcal{S})\right)^{\oplus}\right)^{\top}\right)$$

#### **3 Cardinality Constraints**

This section is concerned with the definition of a *positivation* operator (.) <sup>⊕</sup> for MIL sentences, whose only requirements are that <sup>φ</sup><sup>⊕</sup> is positive and <sup>φ</sup> <sup>≡</sup><sup>μ</sup> <sup>φ</sup>⊕. For this purpose, we use a logic of quantifier-free *boolean cardinality constraints* [4,18] as an intermediate language, on which the positive formulae are defined. The translation of MIL into cardinality constraints is done by an equivalence-preserving quantifier elimination procedure, described in Sect. 3.1. As a byproduct, since the satisfiability of quantifier-free cardinality constraints is NP-complete [18] and integrated with SMT [4], we obtain a practical decision procedure for MIL that does not use model enumeration, as suggested by the small model property [19]. Finally, the definition of a positive MIL formula from a boolean combination of quantifier-free cardinality constraints is given in Sect. 3.2.

We start by giving the definition of cardinality constraints. Given the set of monadic predicate symbols Pred, a *boolean term* is generated by the syntax:

$$\mathfrak{a} := \mathfrak{pr} \in \mathbf{Pred} \mid \neg t\_1 \mid t\_1 \land t\_2 \mid t\_1 \lor t\_2 \rangle$$

When there is no risk of confusion, we borrow the terminology of propositional logic and say that a term is in DNF if it is a disjunction of conjunctions (minterms). We also write *t*<sup>1</sup> → *t*<sup>2</sup> if and only if the implication is valid when *t*<sup>1</sup> and *t*<sup>2</sup> are interpreted as boolean formulae, with each predicate symbol viewed as a propositional variable. Two boolean terms *t*<sup>1</sup> and *t*<sup>2</sup> are said to be *compatible* if and only if *t*<sup>1</sup> ∧*t*<sup>2</sup> is satisfiable, when viewed as a boolean formula.

For a boolean term *t* and a first-order variable *i* ∈ Var, we define the shorthand *t*(*i*) recursively, as (¬*t*1)(*i*) def <sup>=</sup> <sup>¬</sup>*t*1(*i*), (*t*<sup>1</sup> <sup>∧</sup>*t*2)(*i*) def <sup>=</sup> *<sup>t</sup>*1(*i*)∧*t*2(*i*) and (*t*<sup>1</sup> <sup>∨</sup>*t*2)(*i*) def = *t*1(*i*)∨*t*2(*i*). Given a positive integer *<sup>n</sup>* <sup>∈</sup> <sup>N</sup> and *<sup>t</sup>* a boolean term, we define the following *cardinality constraints*, by MIL formulae:

$$|t| \ge n \stackrel{\scriptstyle \rm{ad}}{=} \exists i\_1 \dots \exists i\_n \; . \; \text{distinct}(i\_1, \dots, i\_n) \land \bigwedge\_{j=1}^n t(i\_j) \qquad \qquad |t| \le n \stackrel{\scriptstyle \rm{ad}}{=} \neg(|t| \ge n+1) \land \bigwedge\_{j=1}^n t(i\_j)$$

We shall further use cardinality constraints with *<sup>n</sup>* <sup>=</sup> <sup>∞</sup>, by defining <sup>|</sup>*t*<sup>|</sup> ≥ ∞ def = ⊥ and <sup>|</sup>*t*<sup>|</sup> ≤ ∞ def = . The intuitive semantics of cardinality constraints is formally defined in terms of structures I = (U, ν, ι) by the semantics of monadic predicate logic, given in the previous. For instance, |*p* ∧ *q*| ≥ 1 means that the intersection of the sets *p* and *q* is not empty, whereas |¬*p*| ≤ 0 means that *p* contains all elements from the universe.

#### **3.1 Quantifier Elimination**

Given a sentence φ, written in MIL, we build an equivalent boolean combination of cardinality constraints qe(φ), using quantifier elimination. We describe the elimination of a single existential quantifier and the generalization to several existential or universal quantifiers is immediate. Assume that φ = ∃*i*<sup>1</sup> . *<sup>k</sup>*∈*<sup>K</sup>* ψ*k*(*i*1,..., *im*), where *K* is a finite set of indices and, for each *k* ∈ *K*, ψ*<sup>k</sup>* is a quantifier-free conjunction of atomic propositions of the form *ij* = *i*, pr(*ij*) and their negations, for some *j*, ∈ [1, *m*]. We write, equivalently, φ ≡ *<sup>k</sup>*∈*<sup>K</sup>* ϕ*<sup>k</sup>* ∧ ∃*i*<sup>1</sup> . θ*k*(*i*1,..., *im*), where ϕ*<sup>k</sup>* does not contain occurrences of *i*<sup>1</sup> and θ*<sup>k</sup>* is a conjunction of literals of the form pr(*i*1), ¬pr(*i*1), *i*<sup>1</sup> = *ij* and ¬*i*<sup>1</sup> = *ij*, for some *j* ∈ [2, *m*]. For each *k* ∈ *K*, we distinguish the following cases:

1. if *<sup>i</sup>*<sup>1</sup> <sup>=</sup> *ij* is a consequence of <sup>θ</sup>*k*, for some *<sup>j</sup>* <sup>&</sup>gt; 1, let qe(∃*i*<sup>1</sup> . θ*k*) def = θ*k*[*ij*/*i*1]. 2. else, θ*<sup>k</sup>* = *<sup>j</sup>*∈*Jk* ¬*i*<sup>1</sup> = *ij* ∧ *tk*(*i*1) for some *Jk* ⊆ [2, *m*] and boolean term *tk*, and let:

$$\begin{array}{c} \mathsf{qe}(\exists i\_{1} \,\,\theta\_{k}) \stackrel{\mathsf{def}}{=} \bigwedge\_{J \subseteq J\_{k}} \left[ \mathsf{distinct}(\{i\_{j}\}\_{j \in J}) \wedge \bigwedge\_{j \in J} t\_{k}(i\_{j}) \right] \to |t\_{k}| \ge ||J|| + 1\\ \mathsf{qe}(\phi) \stackrel{\mathsf{def}}{=} \bigvee\_{k \in K} \varphi\_{k} \wedge \mathsf{qe}(\exists i\_{1} \,\, . \,\theta\_{k}) \end{array}$$

Universal quantification is dealt with using the duality qe(∀*i*<sup>1</sup> . ψ) def = ¬qe(∃*i*<sup>1</sup> . ¬ψ). For a prenex formula φ = *Qnin* ... *Q*1*i*<sup>1</sup> . ψ, where *Q*1,..., *Qn* ∈ {∃, ∀} and ψ is quantifierfree, we define, recursively qe(φ) def = qe(*Qnin* . qe(*Qn*−<sup>1</sup>*in*−<sup>1</sup> ... *Q*1*i*<sup>1</sup> . ψ)). It is easy to see that, if φ is a sentence, qe(φ) is a boolean combination of cardinality constraints. The correctness of the construction is a consequence of the following lemma:

**Lemma 5.** *Given a* MIL *formula* φ = *Qnin* ... *Qii*<sup>1</sup> . ψ*, where Q*1,..., *Qn* ∈ {∀, ∃} *and* ψ *is a quantifier-free conjunction of equality and predicate atoms, we have* φ ≡ qe(φ)*.*

*Example 3.* (contd. from Example 2) Below we show the results of quantifier elimination applied to the conjunction Θ(Γ) ∧ *Init*(S) for the system in Fig. 1b:

(¬*r* ∧ ¬*s* ∧ |*w* ∧ ¬*u*| ≤ 0 ∧ |*u* ∧ ¬*w*| ≤ 0 ∧ 1 ≤ |*w*|) ∨ (¬*r* ∧ |*w* ∧ ¬*u*| ≤ 0 ∧ |¬*w*| ≤ 0 ∧ 1 ≤ |*w*|) ∨ (*s* ∧ *r*) ∨ (*s* ∧ |¬*w*| ≤ 0 ∧ 1 ≤ |*w*|) ∨ (¬*s* ∧ |¬*u*| ≤ 0 ∧ |*u* ∧ ¬*w*| ≤ 0 ∧ 1 ≤ |*w*|) ∨ (|¬*u*| ≤ 0 ∧ |¬*w*| ≤ 0 ∧ 1 ≤ |*w*|).

Similarly, for the system in Fig. 3, we obtain the following cardinality constraints:

(3 ≤ |*w*|∧|*u* ∧ ¬*w*| ≤ 0) ∨ (2 ≤ |*w*|∧|*w* ∧ ¬*u*| ≤ 1 ∧ |*u* ∧ ¬*w*| ≤ 0) ∨ (|¬*u*| ≤ 1 ∧ |¬*u* ∧ ¬*w*| ≤ 0 ∧ |*u* ∧ ¬*w*| ≤ 0 ∧ 1 ≤ |*w*|) ∨ (|*w* ∧ ¬*u*| ≤ 0 ∧ |*u* ∧ ¬*w*| ≤ 0 ∧ 1 ≤ |*w*|).

#### **3.2 Building Positive Formulae that Preserve Minimal Models**

Let φ be a MIL formula, not necessarily positive. We shall build a positive formula <sup>φ</sup>⊕, such that <sup>φ</sup> <sup>≡</sup><sup>μ</sup> <sup>φ</sup>⊕. By the result of the last section, <sup>φ</sup> is equivalent to a boolean combination of cardinality constraints qe(φ), obtained by quantifier elimination. Thus we assume w.l.o.g. that the DNF of φ is a disjunction of conjunctions of the form - *<sup>i</sup>*∈*<sup>L</sup>* |*ti*| ≥ *<sup>i</sup>* ∧ *j*∈*U tj* <sup>≤</sup> *uj*, for some sets of indices *<sup>L</sup>*, *<sup>U</sup>* and some positive integers {*i*}*<sup>i</sup>*∈*<sup>L</sup>* and {*uj*}*<sup>j</sup>*∈*<sup>U</sup>*.

For a boolean combination of cardinality constraints ψ, we denote by P(ψ) the set of predicate symbols that occur in a boolean term of ψ and by P<sup>+</sup>(ψ) (P−(ψ)) the set of predicate symbols that occur under an even (odd) number of negations in ψ. The following proposition allows to restrict the form of φ even further, without losing generality:

**Proposition 1.** *Given* MIL *formulae* φ<sup>1</sup> *and* φ2*, for any positivation operator* (.) <sup>⊕</sup>*, the following hold:*

*1.* (φ<sup>1</sup> ∨ φ2) <sup>⊕</sup> <sup>≡</sup><sup>μ</sup> <sup>φ</sup><sup>1</sup> <sup>⊕</sup> ∨ φ<sup>2</sup> ⊕*, 2.* (φ<sup>1</sup> ∧ φ2) <sup>⊕</sup> <sup>≡</sup><sup>μ</sup> <sup>φ</sup><sup>1</sup> <sup>⊕</sup> ∧ φ<sup>2</sup> <sup>⊕</sup>*, provided that* P(φ1) ∩ P(φ2) = ∅*.*

From now on, we assume that φ is a conjunction of cardinality constraints that cannot be split as φ = φ<sup>1</sup> ∧ φ2, such that P(φ1) ∩ P(φ2) = ∅.

Let us consider a cardinality constraint |*t*| ≥ that occurs in φ. Given a set P of predicate symbols, for a set of predicates *S* ⊆ P, the *complete* boolean minterm corresponding to *S* with respect to P is *t* P *S* def = *<sup>p</sup>*∈*<sup>S</sup> p* ∧ *<sup>p</sup>*∈P\*<sup>S</sup>* ¬*p*. Moreover, let S*t* def = {*S* ⊆ P(φ) | *tS* → *t*} be the set of sets *S* of predicate symbols for which the complete minterm *tS* implies *t*. Finally, each cardinality constraint |*t*| ≥ is replaced by the equivalent disjunction7, in which each boolean term is complete with respect to P(φ):

$$|t| \ge \ell \equiv \bigvee \left\{ \bigwedge\_{\mathcal{S} \in \mathcal{S}\_l} \big| t\_{\mathcal{S}}^{\mathbb{R}\_{\mathcal{S}}} \big| \ge \ell\_{\mathcal{S}} \mid \text{ for some constants } \{\ell\_{\mathcal{S}} \in \mathbb{N}\}\_{\mathcal{S} \in \mathcal{S}\_l} \text{ such that } \sum\_{\mathcal{S} \in \mathcal{S}\_l} \ell\_{\mathcal{S}} = \ell \right\}$$

Note that because any two complete minterms *tS* and *tT* , for *S* - *T*, are incompatible, then necessarily |*tS* ∨ *tT* | = |*tS* | + |*tT* |. Thus |*tS* ∨ *tT* | ≥ if and only if there exist 1, <sup>2</sup> <sup>∈</sup> <sup>N</sup> such that <sup>1</sup> <sup>+</sup> <sup>2</sup> <sup>=</sup> and <sup>|</sup>*tS* <sup>|</sup> <sup>≥</sup> 1, <sup>|</sup>*tT* <sup>|</sup> <sup>≥</sup> 2, respectively.

Notice that, restricting the sets of predicates in S*<sup>t</sup>* to subsets of P(φ), instead of the entire set of predicates, allows to apply Proposition 1 and reduce the number of complete minterm to be considered. That is, whenever possible, we write each minterm - *<sup>i</sup>*∈*<sup>L</sup>* |*ti*| ≥ *i*∧*j*∈*U tj* <sup>≤</sup> *uj* in the DNF of <sup>φ</sup> as <sup>ψ</sup>1∧...∧ψ*k*, such that P(ψ*i*)∩P(ψ*j*) <sup>=</sup> <sup>∅</sup> for all 1 ≤ *i* < *j* ≤ *k*. In practice, this optimisation turns out to be quite effective, as shown by the small execution times of our test cases, reported in Sect. 5.

The second step is building, for each conjunction *C* = -{*<sup>S</sup>* ≤ *t* P(φ) *S* ∧ *t* P(φ) *S* <sup>≤</sup> *uS* <sup>|</sup> *S* ⊆ P(φ)} 8, as above, a positive formula *C*⊕, that preserves its set of minimal models [[*C*]]<sup>μ</sup>. The generalization to arbitrary boolean combinations of cardinality constraints is a direct consequence of Proposition 1. Let L<sup>+</sup>(φ) (resp. L−(φ)) be the set of positive boolean combinations of predicate symbols *p* ∈ P<sup>+</sup>(φ) (resp. ¬*p*, where *p* ∈ P−(φ)). Further, for a complete minterm *t* P *<sup>S</sup>* , we write *t* P *S* <sup>+</sup> (*t* P *S* <sup>−</sup>) for the conjunction of the positive (negative) literals in *t* P *<sup>S</sup>* . Then, we define:

$$\mathcal{C}^{\oplus} \stackrel{\scriptstyle \mathsf{d}\prime}{=} \bigwedge \{ |\tau| \ge \sum\_{I\_{\mathcal{S}}^{\mathsf{p}(\phi)^{+}} \to \tau} \ell\_{\mathcal{S}} \mid \tau \in \mathcal{L}^{+}(\phi) \} \wedge \bigwedge \{ |\tau| \le \sum\_{I\_{\mathcal{S}}^{\mathsf{p}(\phi)^{-}} \to \tau} \mu\_{\mathcal{S}} \mid \tau \in \mathcal{L}^{-}(\phi) \}$$

It is not hard to see that *C*<sup>⊕</sup> is a positive MIL formula, because:

<sup>7</sup> The constraints <sup>|</sup>*t*<sup>|</sup> <sup>≤</sup> *<sup>u</sup>* are dealt with as <sup>¬</sup>(|*t*<sup>|</sup> <sup>≥</sup> *<sup>u</sup>* <sup>+</sup> 1). <sup>8</sup> Missing lower bounds *<sup>S</sup>* are replaced with 0 and missing upper bounds *uS* with <sup>∞</sup>.


The following lemma proves that the above definition meets the second requirement of positivation operators, concerning the preservation of minimal models.

**Lemma 6.** *Given* <sup>P</sup> *a finite set of monadic predicate symbols,* {*<sup>S</sup>* <sup>∈</sup> <sup>N</sup>}*<sup>S</sup>*⊆P *and* {*uS* <sup>∈</sup> <sup>N</sup>∪ {∞}}*<sup>S</sup>*⊆P *sets of constants, for any conjunction C* <sup>=</sup> -{*<sup>S</sup>* ≤ *t* P *S* ∧ *t* P *S* <sup>≤</sup> *uS* <sup>|</sup> *<sup>S</sup>* ⊆ P}*, we have C* <sup>≡</sup><sup>μ</sup> *<sup>C</sup>*⊕*.*

*Example 4* (contd. from Example 3).

Consider the first minterm of the DNF of the cardinality constraint obtained by quantifier elimination in Example 3, from the system in Fig. 1b. The result of positivation for this minterm is given below:

$$\left(\left|\neg r \land \neg s \land |w \land \neg u| \le 0 \land |u \land \neg w| \le 0 \land 1 \le |w|\right)^{\psi} = 1 \le |u \land w|\right)$$

Intuitively, the negative literals ¬*r* and ¬*s* may safely disappear, because no minimal model will assign *r* or *s* to true. Further, the constraints |*w* ∧ ¬*u*| ≤ 0 and |*u* ∧ ¬*w*| ≤ 0 are equivalent to the fact that, in any structure I = (U, ν, ι), we must have ι(*u*) = ι(*w*). Finally, because |*w*| ≥ 1, then necessarily |*u* ∧ *w*| ≥ 1.

Similarly, the result of positivation applied to the second conjunct of the DNF cardinality constraint corresponding to the system in Fig. 3 is given below:

$$\left(2 \le |w| \land |w \land \neg u| \le 1 \land |u \land \neg w| \le 0\right)^{\#} = 2 \le |w| \land 1 \le |u \land w|$$

Here, the number of elements in *w* is at least 2 and, in any structure I = (U, ν, ι), we must have ι(*u*) ⊆ ι(*w*) and at most one element in ι(*w*) \ ι(*u*). Consequently, the intersection of the sets ι(*u*) and ι(*w*) must contain at least one element, i.e. |*u* ∧ *w*| ≥ 1.

#### **4 Proving Deadlock Freedom of Parametric Systems**

We have gathered all the ingredients necessary for checking deadlock freedom of parametric systems, using our method based on trap invariant generation (Fig. 4). In particular, we derive a trap constraint Θ(Γ) directly from the interaction formula Γ, both of which are written in MIL. Second, we compute a positive formula that preserves the set of minimal models of Θ(Γ) ∧ *Init*(S), by first converting the MIL formula into a quantifier-free cardinality constraint, using quantifier elimination, and deriving a positive MIL formula from the latter.

The conjunction between the dual of this positive formula and the formula Δ(Γ) that defines the deadlock states is then checked for satisfiability. Formally, given a parametric system S, with an interaction formula Γ written in the form (1), the MIL formula characterizing the deadlock states of the system is the following:

$$\mathcal{A}(\Gamma) \stackrel{\scriptstyle \mathfrak{a}t}{=} \forall i\_1 \dots \forall i\_\ell \ . \varphi \to \left[ \bigvee\_{j=1}^\ell \neg \! ^\bullet p\_j(i\_j) \lor \bigvee\_{j=\ell+1}^{\ell+m} \exists i\_j \ . \psi\_j \wedge \neg ^\bullet p\_j(i\_j) \right]$$

We state a sufficient verification condition for deadlock freedom in the parametric case:

**Fig. 4.** Verification of parametric component-based systems

**Corollary 2.** *A parametric system* S = C<sup>1</sup> ,..., C*<sup>n</sup>* , M, Γ *is deadlock-free if*

$$\left( (\Theta(\Gamma) \land \mathit{Init}(\mathcal{S}))^{\oplus} \right)^{\sim} \land \mathcal{A}(\Gamma) \to \bot$$

The satisfiability check is carried out using the conversion to cardinality constraints via quantifier elimination Sect. 3.1 and an effective set theory solver for cardinality constraints, implemented in the CVC4 SMT solver [6].

#### **5 Experimental Results**

To assess our method for proving deadlock freedom of parametric component-based system, we ran a number of experiments on systems with a small numbers of rather simple component types, but with nontrivial interaction patterns, given by MIL formulae. The task-sem *i*/*n* examples, *i* = 1, 2, 3, are generalizations of the parametric *Task*-*Semaphore* example depicted in Fig. 1b, in which *n Task*s synchronize using *n Semaphore*s, such that *i Task*s interact with a single *Semaphore* at once, in a multiparty rendez-vous. In a similar vein, the broadcast *i*/*n* examples, *i* = 2, 3 are generalizations of the system in Fig. 3, in which *i* out of *n Worker*s engage in rendez-vous on the *b* port, whereas all the other stay idle—here idling is modeled as a broadcast on the *a* ports. Finally, in the sync i/n examples, *i* = 1, 2, 3, we consider systems composed of *n Worker*s (Fig. 1b) such that either *i* out of *n* instances simultaneously interact on the *b* ports, or all interact on the *f* ports. Notice that, for *i* = 2, 3, these systems have a deadlock if and only if *n* - 0 mod *i*. This is because, if *n* = *m* mod *i*, for some 0 < *m* < *i*, there will be be *m* instances that cannot synchronize on their *b* port, in order to move from *w* to *u*, in order to engage in the *f* broadcast.

All experiments were carried out on a Intel(R) Xeon(R) CPU @ 2.00 GHz virtual machine with 4 GB of RAM. Table 1 shows separately the times needed to generate the proof obligations (trap invariants and deadlock states) from the interaction formulae and the times needed by CVC4 1.7 to show unsatisfiabilty or come up with a model. All systems considered, for which deadlock freedom could not be shown using our method, have a real deadlock scenario that manifests only under certain modulo constraints on


**Table 1.** Benchmarks

the number *n* of instances. These constraints cannot be captured by MIL formulae, or, equivalently by cardinality constraints, and would require cardinality constraints of the form <sup>|</sup>*t*<sup>|</sup> <sup>=</sup> *<sup>n</sup>* mod *<sup>m</sup>*, for some constants *<sup>n</sup>*, *<sup>m</sup>* <sup>∈</sup> <sup>N</sup>.

### **6 Conclusions**

This work is part of a lasting research program on BIP linking two work directions: (1) recent work on modeling architectures using interaction logics, and (2) older work on verification by using invariants. Its rationale is to overcome as much as possible complexity and undecidability issues by proposing methods which are adequate for the verification of essential system properties.

The presented results are applicable to a large class of architectures characterized by the MIL. A key technical result is the translation of MIL formulas into cardinality constraints. This allows on the one hand the computation of the MIL formula characterizing the minimal trap invariant. On the other hand, it provides a decision procedure for MIL, that leverages from recent advances in SMT, implemented in the CVC4 solver [6].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **The mCRL2 Toolset for Analysing Concurrent Systems Improvements in Expressivity and Usability**

Olav Bunte<sup>1</sup>, Jan Friso Groote1(B), Jeroen J. A. Keiren1,2, Maurice Laveaux<sup>1</sup>, Thomas Neele<sup>1</sup>, Erik P. de Vink<sup>1</sup>, Wieger Wesselink<sup>1</sup>, Anton Wijs<sup>1</sup>, and Tim A. C. Willemse<sup>1</sup>

<sup>1</sup> Eindhoven University of Technology, Eindhoven, The Netherlands j.f.groote@tue.nl <sup>2</sup> Open University of the Netherlands, Heerlen, The Netherlands

**Abstract.** Reasoning about the correctness of parallel and distributed systems requires automated tools. By now, the mCRL2 toolset and language have been developed over a course of more than fifteen years. In this paper, we report on the progress and advancements over the past six years. Firstly, the mCRL2 language has been extended to support the modelling of probabilistic behaviour. Furthermore, the usability has been improved with the addition of refinement checking, counterexample generation and a user-friendly GUI. Finally, several performance improvements have been made in the treatment of behavioural equivalences. Besides the changes to the toolset itself, we cover recent applications of mCRL2 in software product line engineering and the use of domain specific languages (DSLs).

### **1 Introduction**

Parallel programs and distributed systems become increasingly common. This is driven by the fact that Dennard's scaling theory [17], stating that every new processor core is expected to provide a performance gain over older cores, does not hold any more, and instead performance is to be gained from exploiting multiple cores. Consequently, distributed system paradigms such as cloud computing have grown popular. However, designing parallel and distributed systems correctly is notoriously difficult. Unfortunately, it is all too common to observe flaws such as data loss and hanging systems. Although these may be acceptable for many non-critical applications, the occasional hiccup may be impermissible for critical applications, *e.g.*, when giving rise to increased safety risks or financial loss.

The mCRL2 toolset is designed to reason about concurrent and distributed systems. Its language [27] is based on a rich, ACP-style process algebra and has an axiomatic view on processes. The data theory is rooted in the theory of abstract data types (ADTs). The toolset consists of over sixty tools supporting visualisation, simulation, minimisation and model checking of complex systems.

In this paper, we present an overview of the mCRL2 toolset in general, focussing on the developments from the past six years. We first present a cursory overview of the mCRL2 language, and discuss the recent addition of support for modelling and analysing *probabilistic processes.*

*Behavioural equivalences* such as strong and branching bisimulation are used to reduce and compare state spaces of complex systems. Recently, the complexity of branching bisimulation has been significantly improved from O(mn) to <sup>O</sup>(m(log <sup>|</sup>*Act*<sup>|</sup> + log <sup>n</sup>)), where <sup>m</sup> is the number of transitions, <sup>n</sup> the number of states, and *Act* the set of actions. This was achieved by implementing the new algorithm by Groote *et al.* [24]. Additionally, support for checking (weak) failures refinement and failures divergence refinement has been added.

*Model checking* in mCRL2 is based on parameterised boolean equation systems (PBES) [33] that combine information from a given mCRL2 specification and a property in the modal μ-calculus. Solving the PBES answers the encoded model checking problem. Recent developments include improved static analysis of PBESs using liveness analysis, and solving PBESs for infinite-state systems using symbolic quotienting algorithms and abstraction. One of the major features recently introduced is the ability to generate comprehensive counterexamples in the form of a subgraph of the original system.

To aid novice users of mCRL2, an alternative graphical user-interface (GUI), mcrl2ide, has been added, that contains a text editor to create mCRL2 specifications, and provides access to the core functionality of mCRL2 without requiring the user to know the interface of each of the sixty tools. The use of the language and tools is illustrated by means of a selection of case studies conducted with mCRL2. We focus on the application of the tools as a verification back-end for domain specific languages (DSLs), and the verification of software product lines.

The mCRL2 toolset can be downloaded from the website www.mcrl2.org. This includes binaries as well as source code packages<sup>1</sup>. To promote external contributions, the source code of mCRL2 and the corresponding issue tracker have been moved to GitHub.<sup>2</sup> The mCRL2 toolset is open source under the permissive Boost license, that allows free use for any purpose. Technical documentation and a user manual of the mCRL2 toolset, including a tutorial, can be found on the website. An extensive introduction to the mCRL2 language can be found in the textbook *Modeling and analysis of communicating systems* [27].

The rest of the paper is structured as follows. Section 2 introduces the basics of the mCRL2 language and Sect. 3 its probabilistic extension. In Sect. 4, we discuss several new and improved tools for various behavioural relations. Section 5 gives an overview of novel analysis techniques for PBESs, while Sect. 6 introduces mCRL2's improved GUI and Sect. 7 discusses a number of applications. Related work is discussed in Sects. 8 and 9 presents a conclusion and future plans.

<sup>1</sup> The source code is also archived on https://doi.org/10.5281/zenodo.2555054.

<sup>2</sup> https://github.com/mCRL2org/mCRL2.

#### **2 The mCRL2 Language and Workflow**

The behavioural specification language mCRL2 [27] is the successor of μCRL (micro Common Representation Language [28]) that was in turn a response to a language called CRL (Common Representation Language) that became so complex that it would not serve a useful purpose.

**sort** *Content* <sup>=</sup> **struct** *bad data* <sup>|</sup> *data*<sup>1</sup> <sup>|</sup> *data*<sup>2</sup> ; **act** *read*, *deliver*, *get*, *put*, *pass on* : *Content* ; **proc** *Filter* <sup>=</sup> - *<sup>c</sup>*:*Content get*(c)·(c ≈ *bad data* → *Filter put*(c) ·*Filter*) ; *Queue*(q : *List*(*Content*)) = - *<sup>c</sup>*:*Content read*(c) ·*Queue*(cq) + q ≈ [ ] → *deliver*(*rhead*(q)) ·*Queue*(*rtail* (q)) ; **init** <sup>∇</sup>{*get,deliver,pass on*} <sup>Γ</sup>{*put*|*read*→*pass on*} *Filter* ||*Queue*([ ]) ;

**Fig. 1.** A filter process communicating with an infinite queue in mCRL2.

The languages μCRL and mCRL2 are quite similar combinations of process algebra in the style of ACP [8] together with equational abstract data types [19]. A typical example illustrating most of the language features of mCRL2 is given in Fig. 1, which shows a filter process (*Filter* ) that iteratively reads data via an action *get* and forwards it to a queue using the action *put* if the data is not bad. The queue (*Queue*) is infinitely sized, reading data via the action *read* and delivering data via the action *deliver* . The processes are put in parallel using the parallel operator -. The actions *put* and *read* are forced to synchronise into the action *pass on* using the communication operator Γ and the allow operator ∇.

The language mCRL2 only contains a minimal set of primitives to express behaviour, but this set is well chosen such that behaviour of communicating systems can be easily expressed. Both μCRL and mCRL2 allow to express systems with time, using positive real time tags to indicate when an action takes place. Recently the possibility has been added to express probabilistic behaviour in mCRL2, which will be explained in Sect. 3.

The differences between μCRL and mCRL2 are minor but significant. In mCRL2 the if-then-else is written as <sup>c</sup>→<sup>p</sup> <sup>q</sup> (was pcq). mCRL2 allows for multi-actions, e.g., <sup>a</sup>|b|<sup>c</sup> expresses that the actions <sup>a</sup>, <sup>b</sup> and <sup>c</sup> happen at the same time. mCRL2 does not allow multiple actions with the same time tag to happen consecutively (μCRL does, as do most other process specification formalisms with time). Finally, mCRL2 has built-in standard datatypes, mechanisms to allow to specify datatypes far more compactly, and it allows for function datatypes, including lambda expressions, as well as arbitrary sets and bags.

The initial purpose of μCRL was to have a mathematical language to model realistic protocols and distributed systems of which the correctness could be proven manually using process algebraic axioms and rules, as well as the equations for the equational data types. The result of this is that mCRL2 is equipped with a nice fundamental theory as well as highly effective proof methods [29,30], which have been used, for instance, to provide a concise, computer checked proof of the correctness of Tanenbaum's most complex sliding window protocol [1].

When the language μCRL began to be used for specifying actual systems [20], it became obvious that such behavioural specifications are too large to analyse by hand and tools were required, a toolset was developed. It also became clear that specifications of actual systems are hard to give without flaws, and verification is needed to eliminate those flaws. In the early days verification had the form of proving that an implementation and a specification were (branching) bisimilar.

Often it is more convenient to prove properties about aspects of the behaviour. For this purpose mCRL2 was extended with a modal logic, in the form of the modal μ-calculus with data and time. A typical example of a formula in modal logic is the following:

$$\begin{array}{c} \nu X(n: \mathbb{N} = 0). \forall m: \mathbb{N}. [enter(m)] X(n+m) \land \\ \quad \forall m: \mathbb{N}. [extract(m)] (m \le n \land X(n-m)) \end{array}$$

which says that the amount extracted using actions *extract* can never exceed the cumulative amount entered via the action *enter* . The modal μ-calculus with data is far more expressive than languages such as LTL and CTL\*, which can be mapped into it [13].

**Fig. 2.** The mCRL2 model checking workflow

Verification of modal formulae is performed through transformations to *linear process specifications* (LPSs) and *parameterised boolean equation systems* (PBESs) [25,33]. See Fig. 2 for the typical model checking workflow. An LPS is a process in normal form, where all state behaviour is translated into data parameters. An LPS essentially consists of a set of condition-action-effect rules saying which action can be done in which state, and as such is a symbolic representation of a state space. A PBES is constructed using a modal formula and a linear process. It consists of a parameterised sequence of boolean fixed point equations. A PBES can be solved to obtain an answer to the question whether the mCRL2 specification satisfies the supplied formula. For more details on PBESs and the generation of evidence, refer to Sect. 5.

Whereas an LPS is a symbolic description of the behaviour of a system, a *labelled transition system* (LTS), makes this behaviour explicit. An LTS can be defined in the context of a set of action labels. The LTS itself consists of a set of states, an initial state, and a transition relation between states where each transition is labelled by an action. The mCRL2 toolset contains the lps2lts tool to obtain the LTS from a given LPS by means of state space exploration. The resulting LTS contains all reachable states of this LPS and the transition relation defining the possible actions in each state. The mCRL2 toolset provides tools for visualising and reducing LTSs and also for comparing LTSs in a pairwise manner. For more details on reducing and comparing LTSs, refer to Sect. 4.

#### **3 Probabilistic Extensions to mCRL2**

A recent addition to the mCRL2 language is the possibility to specify probabilistic processes using the construct **dist** x:D[ *dist*(x) ].p(x) which behaves as the process p(x) with probability *dist*(x). The distribution *dist* may be discrete or continuous. For example, a process describing a light bulb that fails according to a negative exponential distribution of rate λ is described as

$$\textbf{dist}\ r \text{:} \mathbb{R}. \left[ \begin{array}{c} if(r \ge 0, \ \lambda e^{-\lambda r}, \ 0) \end{array} \right]. fail \text{:} r$$

where *fail*r is the notation for the action *fail* that takes place at time r.

The modelling of probabilistic behaviour with the probabilistic extension of mCRL2 can be rather insightful as advocated in [32]. There it is illustrated for the Monty Hall problem and the so-called "problem of the lost boarding pass" how strong probabilistic bisimulation and reduction modulo probabilistic weak trace equivalence can be applied to visualise the *probabilistic LTS* (PLTS) of the underlying probabilistic process as well as to establish the probability of reaching a target state (or set of states). We illustrate this by providing the description and state space of the Monty Hall problem here.

In the Monty Hall problem, there are three doors, one of which is hiding a prize. A player can select a door. Then one of the remaining doors that does not hide the prize is opened. The player can then decide to select the other door. If he does so, he will get the prize with probability <sup>2</sup> <sup>3</sup> . The action prize(*true*) indicates that a prize is won. The action prize(*false*) is an indication that no prize is obtained. A possible model in mCRL2 is given below. In this model the player switches doors. So, the prize is won if the initially selected door was not the door with the prize.

**Fig. 3.** The non-reduced and reduced state space of the Monty Hall problem. At the left the label abbreviates prize(*true*) and *×* stands for prize(*false*)

```
sort Doors = struct door 1 | door 2 | door 3 ;
init dist door with prize : Doors [1/3] .
          dist initially selected door : Doors [1/3] .
               prize(initially selected door 
                                              ≈ door with prize)·δ ;
```
The generated state space for this model is given in Fig. 3 at the left. From probabilistic mCRL2 processes probabilistic transition systems can be generated, which can be reduced modulo strong probabilistic bisimulation [26] (see the next section). The reduced transition system is provided at the right, and clearly shows that the prize is won with probability <sup>2</sup> 3 .

Moreover, modal mu-calculus formulae yielding a probability, *i.e.* a real number, can be evaluated invoking probabilistic counterparts of the central tools in the toolset. For the Monty Hall model the modal formula prize(*true*) *true* will evaluate to the probability <sup>2</sup> <sup>3</sup> . The tool that verified this modal formula is presented in [10]. Although the initial results are promising, the semantic and axiomatic underpinning of the process theory for probabilities is demanding.

### **4 Behavioural Relations**

Given two LTSs, the ltscompare tool can check whether they are related according to one of a number of equivalence and refinement relations. Additionally, the ltsconvert tool can reduce a given LTS modulo an equivalence relation. In the following subsections the recently added implementations of several equivalence and refinement relations are described.

#### **4.1 Equivalences**

The ltscompare tool can check simulation equivalence, and (weak) trace equivalence between LTSs. In the latest release an algorithm for checking ready simulation was implemented and integrated into the toolset [23]. Regarding bisimulations, the tool can furthermore check strong, branching and weak bisimulation between LTSs. The latter two are sensitive to so-called *internal* behaviour, represented by the action τ . *Divergence-preserving* variants of these bisimulations are supported, which take the ability to perform infinite sequences of internal behaviour into account. The above mentioned equivalences can also be used by the ltsconvert tool.

Recently, the Groote/Jansen/Keiren/Wijs algorithm (GJKW) for branching bisimulation [24], with complexity <sup>O</sup>(m(log <sup>|</sup>*Act*<sup>|</sup> + log <sup>n</sup>)), was implemented. When tested in practice, it frequently demonstrates performance improvements by a factor of 10, and occasionally by a factor of 100 over the previous algorithm by Groote and Vaandrager [31].

The improved complexity is the result of combining the *process the smaller half* principle [35] with the key observations made by Groote and Vaandrager regarding internal transitions [31]. GJKW uses partition refinement to identify all classes of equivalent states. Repeatedly, one class (or *block*) B is selected to be the so-called *splitter*, and each block B is checked for the reachability of B, where internal behaviour should be skipped over. In case B is reachable from some states in B but not from others, B needs to be split into two subblocks, separating the states from which B can and cannot be reached. Whenever a fixed-point is reached, the obtained partition defines the equivalence relation.

GJKW applies *process the smaller half* in two ways. First of all, it is ensured that each time a state s is part of a splitter B, the size of B, in terms of number of states, is at most half the size of the previous splitter in which s resided. To do this, blocks are partitioned in *constellations*. A block is selected as splitter iff its size is at most half the number of states in the constellation in which it resides. When a splitter is selected, it is moved into its own, new, constellation, and when a block is split, the resulting subblocks remain in the same constellation.

Second of all, it has to be ensured that splitting a block B takes time proportional to the smallest resulting subblock. To achieve this, two state selection procedures are executed in lockstep, one identifying the states in B that can reach the splitter, and one detecting the other states. Once one of these procedures has identified all its states, those states can be split off from B .

Reachability checking is performed efficiently by using the notion of *bottom state* [31], which is a state that has no outgoing internal transitions leading to a state in the same block. It suffices to check whether any bottom state in B can reach B. Hence, it is crucial that for each block, the set of bottom states is maintained during execution of the algorithm.

GJKW is very complicated due to the amount of book keeping needed to achieve the complexity. Among others, a data structure by Valmari, called *refinable partition* [46] is used, together with three copies of all transitions, structured in different ways to allow fast retrieval in the various stages of the algorithm.

Besides checking for branching bisimulation, GJKW is used as a basis for checking strong bisimulation (in which case it corresponds to the Paige-Tarjan algorithm [41]) and as a preprocessing step for checking weak bisimulation.

For the support of the analysis of probabilistic systems, a number of preliminary extensions have been made to the mCRL2 toolset. In particular, a new algorithm has been added to reduce PLTSs – containing both non-deterministic and probabilistic choice [44] – modulo strong probabilistic bisimulation. This new Paige-Tarjan style algorithm, called GRV [26] and implemented in the tool ltspbisim, improves upon the complexity of the best known algorithm so far by Baier *et al.* [2]. The GRV algorithm was inspired by work on lumping of Markov Chains by Valmari and Franceschinis [47] to limit the number of times a probabilistic transition needs to be sorted. Under the assumption of a bounded fan-out for probabilistic states, the time complexity of GRV is O(n<sup>p</sup> log na) with n<sup>p</sup> equal to the number of probabilistic transitions and n<sup>a</sup> being the number of non-deterministic states in a PLTS.

#### **4.2 Refinement**

In model checking there is typically a single model on which properties, defined in another language, are verified. An alternative approach that can be employed is *refinement* checking. Here, the correctness of the model is verified by establishing a refinement relation between an implementation LTS and a specification LTS. The chosen refinement relation must be strong enough to preserve the desired properties of the model, but also weak enough to allow many valid implementations.

For refinement relations the ltscompare tool can check the asymmetric variants of simulation, ready simulation and (weak) trace equivalence between LTSs. In the latest release, several algorithms have been added to check (weak) trace, (weak) failures and failures-divergences refinement relations based on the algorithms introduced in [48]. We remark that weak failures refinement is known as stable failures refinement in the literature. Several improvements have been made to the reference algorithms and the resulting implementation has been successfully used in practice, as described in Sect. 7.1.

The newly introduced algorithms are based on the notion of *antichains*. These algorithms try to find a witness to show that no refinement relation exists. The antichain data structure keeps track of the explored part of the state space and assists in pruning other parts based on an ordering. If no refinement relation exists, the tool provides a counterexample trace to a violating state. To further speed up refinement checking, the tool applies divergence-preserving branching bisimulation reduction as a preprocessing step.

### **5 Model Checking**

Behavioural properties can be specified in a first-order extension of the modal μ-calculus. The problem of deciding whether a μ-calculus property holds for a given mCRL2 specification is converted to a problem of (partially) solving a PBES. Such an equation system consists of a sequence of parameterised fixpoint equations of the form (σX(d1:D1,...,dn:Dn) = φ), where σ is either a least (μ) or greatest (ν) fixpoint, X is an n-ary typed second-order recursion variable, each d<sup>i</sup> is a parameter of type D<sup>i</sup> and φ is a predicate formula (technically, a first-order formula with second-order recursion variables). The entire translation is syntax-driven, *i.e.*, linear in the size of the linear process specification and the property. We remark that mCRL2 also comes with tools that encode decision problems for behavioural equivalences as equation system solving problems; moreover, mCRL2 offers similar translations operating on labelled transition systems instead of linear process specifications.

#### **5.1 Improved Static Analysis of Equation Systems**

The parameters occurring in an equation system are derived from the parameters present in process specifications and first-order variables present in μ-calculus formulae. Such parameters typically determine the set of second-order variables on which another second-order variable in an equation system depends. Most equation system solving techniques rely on explicitly computing these dependencies. Obviously, such techniques fail when the set of dependencies is infinite. Consider, for instance the equation system depicted below:

$$\begin{array}{l} \nu X(i, k:N) = (i \neq 1 \vee X(1, k+1)) \wedge \forall m \colon N. \ Y(2, k+m) \\\mu Y(i, k:N) = (k < 10 \vee i = 2) \wedge (i \neq 2 \vee Y(1, 1)) \end{array}$$

Observe that the solution to X(1, 1), which is *true*, depends on the solution to X(1, 2), but also on the solution to Y (2, 1 + m) for all m, see Fig. 4. Consequently, techniques that rely on explicitly computing the dependencies will fail to compute the solution to X(1, 1).

**Fig. 4.** Dependencies of second-order recursion variables on other second-order recursion variables in an equation system.

Not all parameters are 'used' equally in an equation system: some parameters may only influence the truth-value of a second-order variable, whereas others may also influence whether an equation depends on second-order variables. For instance, in our example, the parameter i of X determines when there is a dependency of X on X, and in the equation for *Y*, parameter i determines when there is a dependency of Y on *Y*. The value for parameter k, however, is only of interest in the equation for *Y*, where it immediately determines its solution when i = 2: it will be *true* when k < 10 and *false* otherwise. For i = 2, the value of k is immaterial. As suggested by the dependency graph in Fig. 4, for X(1, 1), the only dependency that is ultimately of consequence is the dependency on Y (1, 1), *i.e.*, k = 1; other values for k cannot be reached.

The techniques implemented in the pbesstategraph tool, and which are described in [37], perform a *liveness analysis* for data variables, such as k in our example, and reset these values to default values when their actual value no longer matters. To this end, a static analysis determines a set of *control flow parameters* in an equation system. Intuitively, a control flow parameter is a parameter in an equation for which we can statically detect that it can assume only a finite number of distinct values, and that its values determine which occurrences of recursion variables in an equation are relevant. Such control flow parameters are subsequently used to approximate the dependencies of an equation system, and compute the set of data variables that are still *live*. As soon as a data variable switches from live to not live, it can be set to a default, pre-determined value.

In our example, parameter i in equations X and Y is a control flow parameter that can take on value 1 or 2. Based on a liveness analysis one can conclude that the second argument in both occurrences of the recursion variable X in the equation for X can be reset, leading to an equation system that has the same solution as the original one:

$$\nu X(i, k:N) = (i \neq 1 \vee X(1, 1)) \wedge \forall m \colon N. \ Y(2, 1) \\ \mu Y(i, k:N) = (k < 10 \vee i = 2) \wedge (i \neq 2 \vee Y(1, 1))$$

Observe that there are only a finite number of dependencies in the above equation system, as the universally quantified variable m no longer induces an infinite set of dependencies. Consequently, it can be solved using techniques that rely on computing the dependencies in an equation system. The experiments in [37] show that pbesstategraph in general speeds up solving when it is able to reduce the underlying set of dependencies in an equation system, and when it is not able to do so, the overhead caused by the analysis is typically small.

#### **5.2 Infinite-State Model Checking**

Two new experimental tools, pbessymbolicbisim [40] and pbesabsinthe [16], support model checking of infinite-state systems. These are two of the few symbolic tools in the toolset. Regular PBES solving techniques, such as those implemented in pbessolve, store each state explicitly, which prohibits the analysis of infinite-state systems. In pbessymbolicbisim, (infinite) sets of states are represented using first-order logic expressions. Instead of straightforward exploration, it performs symbolic partition refinement based on the information about the underlying state space that is contained in the PBES. The approximation of the state space is iteratively refined, until it equals the bisimulation quotient of that state space. Moreover, since the only goal of this tool is to solve a PBES, *i.e.* give the answer *true* or *false*, additional abstraction techniques can be very coarse. As a result, the tool often terminates before the bisimulation quotient has been fully computed.

The second tool, pbesabsinthe, requires the user to specify an abstraction mapping manually. If the abstraction mapping satisfies certain criteria, it will be used to generate a finite underlying graph structure. By solving the graph structure, the tool obtains a solution to the PBES under consideration.

The theoretical foundations of pbessymbolicbisim and pbesabsinthe are similar: pbessymbolicbisim computes an abstraction based on an equivalence relation and pbesabsinthe works with preorder-based abstractions. Both approaches have their own strengths and weaknesses: pbesabsinthe requires the user to specify an abstraction manually, whereas pbessymbolicbisim runs fully automatically. However, the analysis of pbessymbolicbisim can be very costly for larger models. A prime application of pbessymbolicbisim and pbesabsinthe is the verification of real-time systems.

#### **5.3 Evidence Extraction**

One of the major new features of the mCRL2 toolset that, until recently, was lacking is the ability to generate informative counterexamples (resp. witnesses) from a failed (resp. successful) verification. The theory of evidence generation that is implemented is based on that of [15], which explains how to extract diagnostic evidence for μ-calculus formulae via the *Least Fixed-Point* (LFP) logic. The diagnostic evidence that is extracted is a subgraph of the original labelled transition system that permits to reconstruct the same proof of a failing (or successful) verification. Note that since the input language for properties can encode branching-time and linear-time properties, diagnostic evidence cannot always be presented in terms of traces or lassos; for linear-time properties, however, the theory permits to generate trace- and lasso-shaped evidence.

A straightforward implementation of the ideas of [15] in the setting of equation systems is, however, hampered by the fact that the original evidence theory builds on a notion of *proof graph* that is different from the one developed in [14] for equation systems. In [49], we show that these differences can be overcome by modifying the translation of the model checking problem as an equation system solving problem. This new translation is invoked by passing the flag '-c' to the tool lps2pbes. The new equation system solver pbessolve can be directed to extract and store the diagnostic evidence from an equation system by passing the linear process specification along with this equation system; the resulting evidence, which is stored as a linear process specification, can subsequently be simulated, minimised or visualised for further inspection.

Figure 5, taken from [49], gives an impression of the shape of diagnostic evidence that can be generated using the new tooling. The labelled transition system that is depicted presents the counterexample to a formula for the CERN job storage management system [43] that states that invariantly, each task that is terminated is inevitably removed. Note that this counterexample is obtained by minimising the original 142-state large evidence produced by our tools modulo branching bisimulation.

**Fig. 5.** Counterexamples for the requirement that *each task in a terminating state is eventually removed* for the Storage Management Systems. We omitted all edge labels, and the dashed line indicates a lengthy path through a number of other states (not depicted), whereas the dotted transitions are 3D artefacts.

### **6 User-Friendly GUI**

The techniques explained in this paper may not be easily accessible to users that are new to the mCRL2 toolset. This is because the toolset is mostly intended for scientific purposes; at least initially, not much attention had been spent on user friendliness. As the toolset started to get used in workshops and academic courses however, the need for this user friendliness increased. This gave rise to the tools mcrl2-gui, a graphical alternative to the command line usage of the toolset, and mcrl2xi, an editor for mCRL2 specifications. However, to use the functionality of the toolset it was still required to know about the individual tools. For instance, to visualise the state space of an mCRL2 specification, one needed to manually run the tools mcrl22lps, lps2lts and ltsgraph.

As an alternative, the tool mcrl2ide has been added to the mCRL2 toolset. This tool provides a graphical user interface with a text editor to create and edit mCRL2 specifications and it provides the core functionality of the toolset such as visualising the (reduced) state space and verifying properties. The tools that correspond to this functionality are abstracted away from the user; only one or a few button clicks are needed.

See Fig. 6 for an instance of mcrl2ide with an open project, consisting of an mCRL2 specification and a number of properties. The UI consists of an editor for mCRL2 specifications, a toolbar at the top, a dock listing defined properties on the right and a dock with console output at the bottom. The toolbar contains buttons for creating, opening and saving a project and buttons for running tools. The properties dock allows verifying each single property on the given mCRL2 specification, editing/removing properties and showing the witness/counterexample after verification.

### **7 Applications**

The mCRL2 toolset and its capabilities have not gone unnoticed. Over the years numerous initiatives and collaborations have sprouted to apply its functionality.

#### **7.1 mCRL2 as a Verification Back-End**

The mCRL2 toolset enjoys a sustained application in industry, often in the context of case studies carried out by MSc or PhD students. Moreover, the mCRL2

**Fig. 6.** An instance of mcrl2ide in Windows 10 with an mCRL2 specification of the alternating bit protocol. The properties in the dock on the right are (from top to bottom) *true*, *false* and not checked yet.

toolset is increasingly used as a back-end aiming at verification of higher-level languages. Some of these applications are built on academic languages; *e.g.*, in [22] the Algebra for Wireless Networks is translated to mCRL2, enabling the verification of protocols for Mobile Ad hoc Networks and Wireless Mesh Networks. Models written in the state-machine based Simple Language of Communicating Objects (SLCO) are translated to mCRL2 to verify shared-memory concurrent systems and reason about the sequential consistency of automatically generated multi-threaded software [42]. Others are targeting more broadly used languages; *e.g.*, in [39], Go programs are translated to mCRL2 and the mCRL2 toolset is used for model checking Go programs.

The use of mCRL2 in industry is furthermore driven by the current *Formal Model-Driven Engineering* (FMDE) trend. In the FMDE paradigm, programs written in a Domain-Specific Language (DSL) are used to generate both executable code and verifiable models. A recent example is the commercial FMDE toolset *Dezyne* developed by Verum, see [9], which uses mCRL2 to check for livelocks and deadlocks, and which relies on mCRL2's facilities to check for refinement relations (see Sect. 4.2) to check for *interface compliance*. Similar languages and methodologies are under development at other companies. For instance, ASML, one of the world's leading manufacturers of chip-making equipment, is developing the *Alias* language, and Oc´e, a global leading company in digital imaging, industrial printing and collaborative business services, is developing the *OIL* language. Both FMDE solutions build on mCRL2.

We believe the FMDE trend will continue in the coming years and that it will influence the development of the toolset. For example, the use of refinement checking in the Dezyne back-end has forced us to implement several optimisations (*cf.* Sect. 4.2). Furthermore, machine-generated specifications are typically longer and more verbose than handwritten specifications. This will require a more efficient implementation of the lineariser – as implemented in mcrl22lps – in the coming years.

#### **7.2 Software Product Lines**

A software product line (SPL) is a collection of systems, individually called products, sharing a common core. However, at specific points the products may show slightly different behaviour dependent on the presence or absence of so-called features. The overall system can be concisely represented as a featured transition system (FTS), an LTS with both actions and boolean expressions over a set of features decorating the transitions (see [12]). If a product, given its features, fulfils the boolean expression guarding the transition the transition may be taken by the product. Basically, there are two ways to analyse SPLs: product-based and family-based. In product-based analysis each product is verified separately; in family-based model checking one seeks to verify a property for a group of products, referred to as a family, as a whole.

Traditionally, dedicated model checkers are exploited for the verification of SPLs. Examples of such SPL model checkers are SNIP and ProVeLines by the team of [12] that are derived from SPIN. However, the mCRL2 toolset as-is, without specific modifications, has also been used to compare product-based vs. family-based model checking [3,5,7]. For this, the extension of the modal μ-calculus for the analysis of FTSes proposed in [4], that combines actions and feature expressions for its modalities, was translated into the first-order μ-calculus [25], the property language of the mCRL2 toolset. As a result, verification of SPLs can be done using the standard workflow for mCRL2, achieving family-based model checking without a family-based model checker [18], with running times slightly worse than, but comparable to those of dedicated tools.

#### **8 Related Work**

Among the many model checkers available, the CADP toolset [21] is the closest related to mCRL2. In CADP, specifications are written in the Lotos NT language, which has been derived from the E-Lotos ISO standard. Similar to mCRL2, CADP relies on *action-based* semantics, *i.e.*, state spaces are stored as an LTS. Furthermore, the verification engine in CADP takes a μ-calculus formula as input and encodes it in a BES or PBES. However, CADP has limited support for μ-calculus formulae with fixpoint alternation and, unlike mCRL2, does not support arbitrary nesting of fixpoints. Whereas the probabilistic analysis tools for mCRL2 are still in their infancy, CADP offers more advanced analysis techniques for Markovian probabilistic systems. The user-license of CADP is restrictive: CADP is not open source and a free license is only available for academic use.

Another toolset that is based on process algebra is Pat [45]. This toolset has native support for the verification of real-time specifications and implements onthe-fly reduction techniques, in particular partial-order reduction and symmetry reduction. Pat can perform model checking of LTL properties.

The toolset LTSmin [36] has a unique architecture in the sense that it is language-independent. One of the supported input languages is mCRL2. Thus, the state space of an mCRL2 specification can also be generated using LTSmin's high-performance multi-core and symbolic back-ends.

Well-known tools that have less in common with mCRL2 are SPIN [34], NuSMV [11], PRISM [38] and UPPAAL [6]. Each of these tools has its own strengths. First of all, SPIN is an explicit-state model checker that incorporates advanced techniques to reduce the size of the state space (partial-order reduction and symmetry reduction) or the amount of memory required (bit hashing). SPIN supports the checking of assertions and LTL formulae. Secondly, NuSMV is a powerful symbolic model checker that offers model checking algorithms such as bounded model checking and counterexample guided abstraction refinement (CEGAR). The tools PRISM and UPPAAL focus on quantitative aspects of model checking. The main goal of PRISM is to analyse probabilistic systems, whereas UPPAAL focusses on systems that involve real-time behaviour.

#### **9 Conclusion**

In the past six years many additions and changes have been made to the mCRL2 toolset and language to improve its expressivity, usability and performance. Firstly, the mCRL2 language has been extended to enable modelling of probabilistic behaviour. Secondly, by adding the ability to check refinement and to do infinite-state model checking the mCRL2 toolset has become applicable in a wider range of situations. Also, the introduction of the generation of counterexamples and witnesses for model checking problems and the introduction of an enhanced GUI has improved the experience of users of the mCRL2 toolset. Lastly, refinements to underlying algorithms, such as those for equivalence reductions and static analyses of PBESs, have resulted in lowered running times when applying the corresponding tools.

For the future, we aim to further strengthen several basic building blocks of the toolset, in particular the term library and the rewriter. The term library is responsible for storage and retrieval of terms that underlie mCRL2 data expressions. The rewriter manipulates data expressions based on rewrite rules specified by the user. Currently, these two components have evolved over time but are rather limitedly documented. It has proven to be difficult to revitalise the current implementation or to make amendments to experiment with new ideas. For this, one of the aims is to investigate the benefits of multi-core algorithms, expecting a subsequent speed-up for many other algorithms in the toolset.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Automatic Analysis of Consistency Properties of Distributed Transaction Systems in Maude**

Si Liu1(B) , Peter Csaba Olveczky ¨ 2(B), Min Zhang3(B) , Qi Wang<sup>1</sup>, and Jos´e Meseguer<sup>1</sup>

<sup>1</sup> University of Illinois, Urbana-Champaign, USA siliu3@illinois.edu <sup>2</sup> University of Oslo, Oslo, Norway peterol@ifi.uio.no

<sup>3</sup> Shanghai Key Laboratory of Trustworthy Computing, ECNU, Shanghai, China zhangmin@sei.ecnu.edu.cn

**Abstract.** Many transaction systems distribute, partition, and replicate their data for scalability, availability, and fault tolerance. However, observing and maintaining strong consistency of distributed and partially replicated data leads to high transaction latencies. Since different applications require different consistency guarantees, there is a plethora of consistency properties—from weak ones such as read atomicity through various forms of snapshot isolation to stronger serializability properties and distributed transaction systems (DTSs) guaranteeing such properties. This paper presents a general framework for formally specifying a DTS in Maude, and formalizes in Maude nine common consistency properties for DTSs so defined. Furthermore, we provide a fully automated method for analyzing whether the DTS satisfies the desired property for all initial states up to given bounds on system parameters. This is based on automatically recording relevant history during a Maude run and defining the consistency properties on such histories. To the best of our knowledge, this is the first time that model checking of all these properties in a unified, systematic manner is investigated. We have implemented a tool that automates our method, and use it to model check state-ofthe-art DTSs such as P-Store, RAMP, Walter, Jessy, and ROLA.

### **1 Introduction**

Applications handling large amounts of data need to partition their data for scalability and elasticity, and need to replicate their data across widely distributed sites for high availability and fault and disaster tolerance. However, guaranteeing strong consistency properties for transactions over partially replicated

c The Author(s) 2019

This work has been partially supported by NRL contract N00173-17-1-G002, and NSFC Project No. 61872146.

T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 40–57, 2019. https://doi.org/10.1007/978-3-030-17465-1\_3

distributed data requires lot of costly coordination that results in long transaction delays. Different applications require different consistency guarantees, and balancing well the trade-off between performance and consistency guarantees is key to designing distributed transaction systems (DTSs). There is therefore a plethora of consistency properties for DTSs over partially replicated data—from weak properties such as read atomicity through various forms of snapshot isolation to strong serializability guarantees—and DTSs providing such guarantees.

DTSs and their consistency guarantees are typically specified informally and validated only by testing; there is very little work on their automated formal analysis (see Section 8). We have previously formally modeled and analyzed single state-of-the-art industrial and academic DTSs, such as Google's Megastore, Apache Cassandra, Walter, P-Store, Jessy, ROLA, and RAMP, in Maude [14].

In this paper we present a *generic* framework for formalizing both DTSs and their consistency properties in Maude. The modeling framework is very general and should allow us to naturally model most DTSs. We formalize nine popular consistency models in this framework and provide a fully automated method and a tool which automates this method—for analyzing whether a DTS specified in our framework satisfies the desired consistency property for all initial states with the user-given number of transactions, data items, sites, and so on.

In particular, we show how one can automatically add a monitoring mechanism which records relevant history during a run of a DTS specified in our framework, and we define the consistency properties on such histories so that the DTS can be directly model checked in Maude. We have implemented a tool that uses Maude's meta-programming features to automatically add the monitoring mechanism, that automatically generates all the desired initial states, and that performs the Maude model checking. We have applied our tool to model check state-of-the-art DTSs such as variants of RAMP, P-Store, ROLA, Walter, and Jessy. To the best of our knowledge, this is the first time that model checking of all these properties in a unified, systematic manner is investigated.

This paper is organized as follows. Section 2 provides background on rewriting and Maude. Section 3 gives an overview of the consistency properties that we formalize. Section 4 presents our framework for modeling DTSs in Maude, and Section 5 explains how to record the history in such models. Section 6 formally defines consistency models as Maude functions on such recorded histories. Section 7 briefly introduces our tool which automates the entire process. Finally, Section 8 discusses related work and Section 9 gives some concluding remarks.

#### **2 Rewriting Logic and Maude**

Maude [14] is a rewriting-logic-based executable formal specification language and high-performance analysis tool for object-based distributed systems.

A Maude module specifies a *rewrite theory* (Σ,E ∪ A, R), where:


axioms such as associativity, commutativity, and identity, so that equational deduction is performed *modulo* the axioms A. The theory (Σ,E ∪A) specifies the system's states as members of an algebraic data type.

– R is a collection of *labeled conditional rewrite rules* [l] : t −→ t **if** *cond*, specifying the system's local transitions.

Equations and rewrite rules are introduced with, respectively, keywords eq, or ceq for conditional equations, and rl and crl. The mathematical variables in such statements are declared with the keywords var and vars, or can have the form var:sort and be introduced on the fly. An equation f(t1,...,t*n*) = t with the owise ("otherwise") attribute can be applied to a subterm f(...) only if no other equation with left-hand side f(u1,...,u*n*) can be applied. Maude also provides standard parameterized data types (sets, maps, etc.) that can be instantiated (and renamed); for example, pr SET{Nat} \* (sort Set{Nat} to Nats) defines a sort Nats of *sets* of natural numbers.

A *class* declaration class C | *att*<sup>1</sup> : *s*1, ..., *att<sup>n</sup>* : *s<sup>n</sup>* declares a class C of objects with attributes att<sup>1</sup> to att*<sup>n</sup>* of sorts s<sup>1</sup> to s*n*. An *object instance* of class C is represented as a term < O : C | *att*<sup>1</sup> : *val* <sup>1</sup>,..., *att<sup>n</sup>* : *val <sup>n</sup>* >, where O, of sort Oid, is the object's *identifier*, and where val<sup>1</sup> to val*<sup>n</sup>* are the current values of the attributes att<sup>1</sup> to att*n*. A *message* is a term of sort Msg. A system state is modeled as a term of the sort Configuration, and has the structure of a *multiset* made up of objects and messages.

The dynamic behavior of a system is axiomatized by specifying each of its transition patterns by a rewrite rule. For example, the rule (with label l)

$$\begin{array}{lcll} \texttt{\texttt{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\langle\tiny{\Gamma{\textrm{\tiny{\langle\langle\}}{\right{\right{?}}}}}}}}}}}}}} & \texttt{\tiny{\textrm{\tiny{\textrm{\tiny{\textrm{\tiny{\varepsilon}}}}}}{\textrm{\scalebox{\prime}}}} \texttt{\tiny{\textrm{\scriptsize{\textrm{\scriptsize{\textrm{\tiny{\textrm{\tiny{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\textrm{\cdots}{\cdots}\right{\mathrm{\infty}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}} $$

defines a family of transitions in which a message m(O, w) is read and consumed by an object O of class C, whose attribute a1 is changed to x+w, and a new message m'(O',x) is generated. Attributes whose values do not change and do not affect the next state, such as a3 and a2, need not be mentioned in a rule.

Maude also supports *metaprogramming* in the sense that a Maude specification *M* can be represented as a *term* M (of sort Module), so that a module transformation can be defined as a Maude function f : Module → Module.

*Reachability Analysis in Maude.* Maude provides a number of analysis methods, including rewriting for simulation purposes, reachability analysis, and linear temporal logic (LTL) model checking. In this paper, we use reachability analysis. Given an initial state *init*, a state pattern *pattern* and an (optional) condition *cond*, Maude's search command searches the reachable state space from *init* in a breadth-first manner for states that match *pattern* such that *cond* holds:

search [*bound*] *init* =>! *pattern* such that *cond* .

where *bound* is an upper bound on the number of solutions to look for. The arrow =>! means that Maude only searches for *final* states (i.e., states that cannot be further rewritten) that match *pattern* and satisfies *cond*. If the arrow is instead =>\* then Maude searches for all reachable states satisfying the search condition.

### **3 Transactional Consistency**

Different applications require different consistency guarantees. There are therefore many consistency properties for DTSs on partially replicated distributed data stores. This paper focuses on the following nine, which span a spectrum from weak consistency such as read committed to strong consistency like serializability:


<sup>1</sup> A transaction is a user application request, typically consisting of a sequence of read and/or write operations on data items, that is submitted to a (distributed) database.

### **4 Modeling Distributed Transaction Systems in Maude**

This section presents a framework for modeling in Maude DTSs that satisfy the following general assumptions:


If a such a DTS is modeled in this framework, our tool can automatically model check whether it satisfies the above consistency properties, as long as it can detect the read and write sets and the above events: start of transaction execution, and abort/commit of a transaction at a certain site. This section explains how the system should be modeled so that our tool automatically discovers these events.

We make the following additional assumptions about the DTSs we target:


### **4.1 Modeling DTSs in Maude**

A DTS is modeled in an object-oriented style, where the state consists of a number of *replica* objects, each modeling a local database/server/site, and a number of messages traveling between the replica objects. A transaction is modeled as an object which resides inside the replica object executing the transaction.

*Basic Data Types.* There are user-defined sorts Key for data items (or keys) and Version for versions of data items, with a partial order < on versions, with v < v denoting that v is a later version of v in <. We then define key-version pairs <*key*,*version*> and sets of such pairs, that model a transaction's read and write sets, as follows:

```
sorts Key Version KeyVersion .
op <_,_> : Key Version -> KeyVersion .
pr SET{KeyVersion} * (sort Set{KeyVersion} to KeyVersions) .
```
<sup>2</sup> Since we do not necessarily deal with real-time systems, this "when" may not denote the real time, but when the event takes place *relative* to other events.

To track the status of a transaction (on non-proxies, or remote servers) we define a sort TxnStatus consisting of some transaction's identifier and its status; this is used to indicate whether a remote transaction (one executed on another server) is committed on this server:

```
op [_,_] : Oid Bool -> TxnStatus [ctor] .
pr SET{TxnStatus} * (sort Set{TxnStatus} to TxnStatusSet) .
```
*Modeling Replicas.* A *replica* (or *site*) stores parts of the database, executes the transactions for which it is the proxy, helps validating other transactions, and is formalized as an object instance of a subclass of the following class Replica:

```
class Replica | executing : Configuration, committed : Configuration,
               aborted : Configuration, decided : TxnStatusSet .
```
The attributes executing, committed, and aborted contain, respectively, transactions that are being executed, and have been committed or aborted on the executing server; decided is the status of transactions executed on other servers.

To model a system-specific replica a user should specify it as an object instance of a subclass of the class Replica with new attributes.

*Example 1.* A replica in our Maude model of Walter [26] is modeled as an object instance of the following subclass Walter-Replica of class Replica that adds 14 new attributes (only 4 shown below):

```
class Walter-Replica | store : Datastore, sqn : Nat,
                      locked : Locks, votes : Vote, ...
subclass Walter-Replica < Replica .
```
*Modeling Transactions.* A *transaction* should be modeled as an object of a subclass of the following class Txn:

class Txn | readSet : KeyVersions, writeSet : KeyVersions .

where readSet and writeSet denote the key/version pairs read and written by the transaction, respectively.

*Example 2.* Walter transactions can be modeled as object instances of the subclass Walter-Txn with four new attributes:

```
class Walter-Txn | operations : OperationList, localVars : LocalVars,
                   startVTS : VectorTimestamp, txnSQN : Nat .
subclass Walter-Txn < Txn .
```
*Modeling System Dynamics.* We describe how the rewrite rules defining the start of a transaction execution and aborts and commits at different sites should be defined so that our tool can detect these events.

– The start of a transaction execution must be modeled by a rewrite rule where the transaction object appears in the proxy server's executing attribute in the right-hand side, but not in the left-hand side, of the rewrite rule.

*Example 3.* A Walter replica starts executing a transaction TID by moving TID in gotTxns (buffering transactions from clients) to executing: 3

```
rl [start-txn] :
   < RID : Walter-Replica | executing : TRANSES, committedVTS : VTS,
           gotTxns : < TID : Txn | startVTS : empty > ;; TXNS >
 =>
   < RID : Walter-Replica | gotTxns : TXNS,
           executing : TRANSES < TID : Txn | startVTS : VTS > > .
```
– When a transaction is *committed* on the executing server, the transaction object must appear in the committed attribute in the right-hand side—but not in the left-hand side—of the rewrite rule. Furthermore, the readSet and writeSet attributes must be explicitly given in the transaction object.

*Example 4.* In Walter, when all operations of an executing read-only transaction have been performed, the proxy commits the transaction directly:

```
rl [commit-read-only-txn] :
   < RID : Walter-Replica | committed : TRANSES',
                             executing : TRANSES
      < TID : Txn | operations : nil, writeSet : empty, readSet : RS > >
 =>
   < RID : Walter-Replica | committed : (TRANSES' < TID : Txn | > ),
                             executing : TRANSES > .
```

These requirements are not very strict. The Maude models of the DTSs RAMP [29], Faster [24], Walter [26], ROLA [25], Jessy [28], and P-Store [32] can all be seen as instantiations of our modeling framework, with very small syntactic changes, such as defining transaction and replica objects as subclasses of Txn and Replica, changing the names of the attributes and sorts, etc. The Apache Cassandra NoSQL key-value store can be seen as a transaction system where each transaction is a single operation; the Maude model of Cassandra in [30] can also be easily modified to fit within our modeling framework.

<sup>3</sup> We do not give variable declarations, but follow the convention that variables are written in (all) capital letters.

### **5 Adding Execution Logs**

To formalize and analyze consistency properties of distributed transaction systems we add an "execution log" that records the *history* of relevant events during a system execution. This section explains how this history recording can be added *automatically* to a model of a DTS that is specified as explained in Section 4.

#### **5.1 Execution Log**

To capture the total order of relevant events in a run, we use a "logical global clock" to order all key events (i.e., transaction starts, commits, and aborts). This clock is incremented by one each time such an event takes place.

A transaction in a replicated DTS is typically committed both locally (at its executing server) and remotely at different times. To capture this, we define a "time vector" using Maude's map data type that maps replica identifiers (of sort Oid) to (typically "logical") clock values (of sort Time, which here are the natural numbers: subsort Nat < Time):

```
pr MAP{Oid,Time} * (sort Map{Oid,Time} to VectorTime) .
```
where each element in the mapping has the form *replica-id* |-> *time* .

An execution log (of sort Log) maps each transaction (identifier) to a record <*proxy*, *issueTime*, *finishTime*, *committed*, *reads*, *writes*>, with *proxy* its proxy server, *issueTime* the starting time at its proxy server, *finishTime* the commit/abort times at each relevant server, *committed* a flag indicating whether the transaction is committed at its proxy, *reads* the key-version pairs read by the transaction, and *writes* the key-version pairs written:

```
sort Record .
op <_,_,_,_,_,_> : Oid Time VectorTime
                   Bool KeyVersions KeyVersions -> Record .
pr MAP{Oid,Record} * (sort Map{Oid,Record} to Log) .
```
#### **5.2 Logging Execution History**

We show how the relevant history of an execution can be recorded during a run of our Maude model by transforming the original Maude model into one which also records this history.

First, we add to the state a Monitor object that stores the current logical global time in the clock attribute and the current log in the log attribute:

< *M* : Monitor | clock : *Time*, log : *Log* >.

The log is updated each time an interesting event (see Section 4.1) happens. Our tool identifies those events and *automatically* transforms the corresponding rewrite rules by adding and updating the monitor object.

Executing. A transaction starts executing when the transaction object appears in a Replica's executing attribute in the right-hand side, but not in the lefthand side, of a rewrite rule. The monitor then adds a record for this transaction, with the proxy and start time, to the log, and increments the logical global clock.

*Example 5.* The rewrite rule in Example 3 where a Walter replica is served a transaction is modified by adding and updating the monitor object (in blue):

```
rl [start-txn] :
   < O@M : Monitor | clock : GT@M, log : LOG@M >
   < RID : Walter-Replica | executing : TRANSES, committedVTS : VTS,
                 gotTxns : < TID : Txn | startVTS : empty > ;; TXNS >
 =>
   < O@M : Monitor | clock : GT@M + 1 , log : LOG@M,
                       (TID |-> < RID, GT@M, empty, false, empty, empty >) >
   < RID : Walter-Replica | gotTxns : TXNS,
                 executing : TRANSES < TID : Txn | startVTS : VTS>>.
```
where the monitor O@M adds a new record for the transaction TID in the log, with starting time (i.e., the current logical global time) GT@M at its executing server RID, finish time (empty), flag (false), read set (empty), and write set (empty). The monitor also increments the global clock by one.

Commit. A transaction commits at its proxy when the transaction object appears in the proxy's committed attribute in the right-hand side, but not in the left-hand side, of a rewrite rule. The record for that transaction is updated with commit status, versions read and written, and commit time, and the global logical clock is incremented.

*Example 6.* The monitor object is added to the rewrite rule in Example 4 for committing a read-only transaction:

```
rl [commit-read-only-txn] :
   < O@M : Monitor | clock : GT@M, log : LOG@M ,
             (TID |-> < RID, T@M, VTS@M, FLAG@M, READS@M, WRITES@M)) >
   < RID : Walter-Replica | committed : TRANSES',
                             executing : TRANSES
           < TID : Txn | operations : nil, writeSet : empty, readSet : RS > >
 =>
   < O@M : Monitor | clock : GT@M + 1 , log : LOG@M ,
             (TID |-> < RID, T@M, insert(RID,GT@M,VTS@M), true, RS, empty >)
   < RID : Walter-Replica | committed : (TRANSES' < TID : Txn | >),
                             executing : TRANSES > .
```
The monitor updates the log for the transaction TID by setting its finish time at the executing server RID to GT@M (insert(RID,GT@M,VTS@M)), setting the committed flag to true, setting the read set to RS and write set to empty (this is a read-only transaction), and increments the global clock.

Abort. Abort is treated as commit, but the commit flag remains false.

Decided. When a transaction's status is decided remotely, the record for that transaction's decision time at the remote replica is updated with the current global time. See [27] for an example.

We have formalized/implemented the transformation from a Maude specification of a DTS into one with a monitor as a meta-level function monitorRules : Module -> Module in Maude. See our longer report [27] for details.

#### **6 Formalizing Consistency Models in Maude**

This section formalizes the consistency properties in Section 3 as functions on the "history log" of a *completed* run. The entire Maude specification of these functions is available at https://github.com/siliunobi/cat. Due to space restrictions, we only show the formalization of four of the consistency models, and refer to our report [27] for the formalization of the other properties.

*Read Committed (RC).* (A transaction cannot read any writes by uncommitted transactions.) Note that standard definitions for single-version databases disallow reading versions that are not committed at the time of the read. We follow the definition for multi-versioned systems by Adya, summarized by Bailis et al. [5], that defines the *RC* property as follows: (i) a committed transaction cannot read a version that was written by an aborted transaction; and (ii) a transaction cannot read *intermediate values*: that is, if T writes two versions < X,V > and < X,V' > with V < V', then no T = T can read < X,V >.

The first equation defining the function rc, specifying when *RC* holds, checks whether some (committed) transaction TID1 read version V of key X (i.e., < X,V > is in TID's read set < X,V > , RS, where RS matches the rest of TID's read set), and this version V was written by some transaction TID2 that was never committed (i.e., TID2's commit flag is false, and its write set is < X,V > , WS'). The second equation checks whether there was an *intermediate* read of a version < X,V > that was overwritten by the same transaction TID2 that wrote the version:<sup>4</sup>

```
op rc : Log -> Bool .
eq rc(TID1 |-> < O, T, VT, true, (< X,V >, RS), WS >,
      TID2 |-> < O', T', VT', false, RS', (< X,V >, WS') >, LOG) = false .
eq rc(TID1 |-> < O, T, VT, true, (< X,V >, RS), WS >,
      TID2 |-> < O', T', VT', true, RS', (< X,V >, < X,V' >,WS') >,
      LOG) = false if V < V' .
eq rc(LOG) = true [owise] .
```
<sup>4</sup> The configuration union and the union operator ',' for maps and sets are declared *associative* and *commutative*. The first equation therefore matches *any* log where some committed transaction read a key-version pair written by some aborted transaction.

*Read Atomicity (RA).* A system guarantees *RA* if it prevents fractured reads and prevents transactions from reading uncommitted or aborted data. A transaction T*<sup>j</sup>* exhibits *fractured reads* if transaction T*<sup>i</sup>* writes versions x*<sup>m</sup>* and y*n*, T*<sup>j</sup>* reads version x*<sup>m</sup>* and version y*k*, and k<n [5]. The function fracRead checks whether there are fractured reads in the log. There is a fractured read if a transaction TID2 reads X and Y, transaction TID1 writes X and Y, TID2 reads the version VX of X written by TID1, and reads a version VY' of Y written *before* VY (VY' < VY):

```
op fracRead : Log -> Bool .
ceq fracRead(TID1 |-> < O, T, VT, true, (< X,VX > , < Y,VY' >, RS), WS >,
              TID2 |-> < O', T', VT', true, RS', (< X,VX > , < Y,VY >, WS') >, LOG)
    = true if VY' < VY .
eq fracRead(LOG) = false [owise] .
```
We define *RA* as the combination of *RC* and no fractured reads:

```
op ra : Log -> Bool .
eq ra(LOG) = rc(LOG) and not fracRead(LOG) .
```
*Parallel snapshot isolation (PSI)* is given by three properties [36]:


The function notSiteSnapshotRead checks whether the system log satisfies PSI-1 by returning true if there is a transaction that did not read the most recent committed version at its executing site when it began:

```
op notSiteSnapshotRead : Log -> Bool .
ceq notSiteSnapshotRead(
       TID1 |-> < RID1, T, VT1, true, (< X,V > , RS1), WS1 >,
       TID2 |-> < RID2, T', (RID1 |-> T2 , VT2), true, RS2, (< X,V > , WS2) >,
       TID3 |-> < RID3, T'', (RID1 |-> T3 , VT3), true, RS3,(< X,V' > , WS3) >,
       LOG) = true if V =/= V' /\ T3 < T /\ T3 > T2 .
ceq notSiteSnapshotRead(
       TID1 |-> < RID1, T, VT1, true, (< X,V > , RS1), WS1 >,
       TID2 |-> < RID2, T', (RID1 |-> T2 , VT2), true, RS2, (< X,V > , WS2) >,
       LOG) = true if T < T2 .
 eq notSiteSnapshotRead(LOG) = false [owise] .
```
<sup>5</sup> Two transactions are *somewhere-concurrent* if they are concurrent at one of their sites.

In the first equation, the transaction TID1, hosted at site RID1, has in its read set a version < X,V > written by TID2. Some transaction TID3 wrote version < X,V' > and was committed at RID1 after TID2 was committed at RID1 (T3 > T2) and before TID1 started executing (T3 < T). Hence, the version read by TID1 was stale. The second equation checks if TID1 read some version that was committed at RID1 after TID1 started (T < T2).

The function someWhereConflict checks whether PSI-2 holds by looking for a write-write conflict between any pair of committed *somewhere-concurrent transactions* in the system log:

```
op someWhereConflict : Log -> Bool .
ceq someWhereConflict(
       TID1 |-> < RID1, T, (RID1 |-> T1 , VT1), true, RS, (< X,V > , WS) >,
       TID2 |-> < RID2, T', (RID1 |-> T2 , VT2), true, RS', (< X,V' > , WS') >,
       LOG) = true if T2 > T /\ T2 < T1 .
 eq someWhereConflict(LOG) = false [owise] .
```
The above function checks whether the transactions with the write conflict are concurrent at the transaction TID1's proxy RID1. Here, TID2 commits at RID1 at time T2, which is between TID1's start time T and its commit time T1 at RID1.

The function notCausality analyzes PSI-3 by checking whether there was a "bad situation" in which a transaction TID1 committed at site RID2 *before* a transaction TID2 started at site RID2 (T1 < T2), while TID1 committed at site RID *after* TID2 committed at site RID (T3 > T4):

```
op notCausality : Log -> Bool .
ceq notCausality(
        TID1 |-> < RID1, T, (RID2 |-> T1 , RID |-> T3 , VT2), true, RS, WS >,
        TID2 |-> < RID2, T2, (RID |-> T4 , VT4), true, RS', WS' >,
        LOG) = true if T1 < T2 /\ T3 > T4 .
 eq notCausality(LOG) = false [owise] .
```
*PSI* can then be defined by combining the above three properties:

```
op psi : Log -> Bool .
eq psi(LOG) = not notSiteSnapshotRead(LOG) and
              not someWhereConflict(LOG) and not notCausality(LOG) .
```
*Non-monotonic snapshot isolation (NMSI)* is the same as *PSI* except that a transaction may read a version committed even after the transaction begins [3]. *NMSI* can therefore be defined as the conjunction of PSI-2 and PSI-3:

```
op nmsi : Log -> Bool .
eq nmsi(LOG) = not someWhereConflict(LOG) and not notCausality(LOG) .
```
*Serializability (SER)* means that the concurrent execution of transactions is equivalent to executing them in some (non-overlapping in time) sequence [33].

A formal definition of *SER* is based on *direct serialization graphs* (DSGs): an execution is serializable if and only if the corresponding DSG is acyclic. Each node in a DSG corresponds to a committed transaction, and directed edges in a DSG correspond to the following types of direct dependencies [2]:


There is a directed edge from a node T*<sup>i</sup>* to another node T*<sup>j</sup>* if transaction T*<sup>j</sup>* directly read-/write-/antidepends on transaction T*i*.

The dependencies/edges can easily be extracted from the our log as follows:


We have defined a data type Dsg for DSGs, a function dsg : Log -> Dsg that constructs the DSG from a log, and a function cycle : Dsg -> Bool that checks whether a DSG has cycles. We refer to [27] for their definition in Maude.

*SER* then holds if there is no cycle in the constructed DSG:

op ser : Log -> Bool . eq ser(LOG) = not cycle(dsg(LOG)) .

### **7 Formal Analysis of Consistency Properties of DTSs**

We have implemented the *Consistency Analysis Tool* (CAT) that automates the method in this paper. CAT takes as input:


Given these inputs, CAT performs the following steps:


*search [1] init =>! C:Configuration < M:Oid : Monitor | log: LOG:Log clock: N:Nat > such that not consistency-property(LOG:Log) .*

where the underlined functions are parametric, and are instantiated by the user inputs; e.g., *consistency-property* is replaced by the corresponding function rc, psi, nmsi, . . . , or ser, depending on which property to analyze.

CAT outputs either "No solution," meaning that all runs from all the given initial states satisfy the desired consistency property, or a counterexample (in Maude at the moment) showing a behavior that violates the property.

**Table 1.** Model checking results w.r.t. consistency properties. "-", "*×*", and "-" refer to satisfying and violating the property, and "not applicable," respectively.


We have applied our tool to 14 Maude models of state-of-the-art academic DTSs (different variants of RAMP and Walter, ROLA, Jessy, and P-Store) against all nine properties. Table 1 only shows six case studies due to space limitations. All model checking results are as expected. It is worth remarking that our automatic analysis found all the violations of properties that the respective systems should violate. There are also some cases where model checking is not applicable ("-" in Table 1): some system models do not include a mechanism for committing a transaction on remote servers (i.e., no commit time on any remote server is recorded by the monitor). Thus, model checking *NMSI* or *PSI* is not applicable.

We have performed our analysis with different initial states, with up to 4 transactions, 4 operations per transaction, 2 clients, 2 servers, 2 keys, and 2 replicas per key. Each analysis command took about 15 minutes (worst case) to execute on a 2.9 GHz Intel 4-Core i7-3520M CPU with 3.6 GB memory.

### **8 Related Work**

*Formalizing Consistency Properties in a Single Framework.* Adya [2] uses dependencies between reads and writes to define different isolation models in database systems. Bailis et al. [5] adopts this model to define read atomicity. Burckhardt et al. [11] and Cerone et al. [12] propose axiomatic specifications of consistency models for transaction systems using visibility and arbitration relationships. Shapiro et al. [35] propose a classification along three dimensions (total order, visibility, and transaction composition) for transactional consistency models. Crooks et al. [15] formalizes transactional consistency properties in terms of observable states from a client's perspective. On the non-transactional side, Burckhardt [10] focuses on session and eventual consistency models. Viotti *et al.* [38] expands his work by covering more than 50 non-transactional consistency properties. Szekeres *et al.* [37] propose a unified model based on result visibility to formalize both transactional and non-transactional consistency properties.

All of these studies propose semantic models of consistency properties suitable for theoretical analysis. In contrast, we aim at algorithmic methods for automatically verifying consistency properties based on executable specifications of both the systems and their consistency models. Furthermore, none of the studies covered all of the transactional consistency models considered in this paper.

*Model Checking Distributed Transaction Systems.* There is very little work on model checking state-of-the-art DTSs, maybe because the complexity of these systems requires expressive formalisms. Engineers at Amazon Web Services successfully used TLA+ to model check key algorithms in Amazon's Simple Storage Systems and DynamoDB database [31]; however, they do not state which consistency properties, if any, were model checked. The designers of the TAPIR transaction protocol have specified and model checked correctness properties of their design using TLA+ [41]. The IronFleet framework [20] combines TLA+ analysis and Floyd-Hoare-style imperative verification to reason about protocol-level concurrency and implementation complexities, respectively. Their methodology requires "considerable assistance from the developer" to perform the proofs.

Distributed model checkers [22,40] are used to model check *implementations* of distributed systems such as Cassandra, ZooKeeper, the BerkeleyDB database and a replication protocol implementation.

Our previous work [8,18,19,24–26,28,29,32] specifies and model checks *single* DTSs and consistency properties in different ways, as opposed to in a single framework that, furthermore, automates the "monitoring" and analysis process.

*Other Formal Reasoning about Distributed Database Systems.* Cerone et al. [13] develop a new characterization of *SI* and apply it to the static analysis of DTSs. Bernardi et al. [7] propose criteria for checking the robustness of transactional programs against consistency models. Bouajjani et al. [9] propose a formal definition of eventual consistency, and reduce the problem of checking eventual consistency to reachability and model checking problems. Gotsman *et al.* [17] propose a proof rule for reasoning about non-transactional consistency choices.

There is also work [23,34,39] that focuses on specifying, implementing and verifying distributed systems using the Coq proof assistant. Their executable Coq "implementations" can be seen as executable high-level formal specifications, but the theorem proving requires nontrivial user interaction.

#### **9 Concluding Remarks**

In this paper we have provided an object-based framework for formally modeling distributed transaction systems (DTSs) in Maude, have explained how such models can be automatically instrumented to record relevant events during a run, and have formally defined a wide range of consistency properties on such histories of events. We have implemented a tool which automates the entire instrumentation and model checking process. Our framework is very general: we could easily adapt previous Maude models of state-of-the-art DTSs such as Apache Cassandra, P-Store, RAMP, Walter, Jessy, and ROLA to our framework.

We then model checked the DTSs w.r.t. all the consistency properties for all initial states with 4 transactions, 2 sites, and so on. This analysis was sufficient to differentiate the DTSs according to which consistency properties they satisfy.

In future work we should formally relate our definitions of the consistency properties to other (non-executable) formalizations of consistency properties. We should also extend our work to formalizing and model checking non-transactional consistency properties for key-value stores such as Cassandra.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Multi-core On-The-Fly Saturation**

Tom van Dijk1,2(B), Jeroen Meijer<sup>1</sup>, and Jaco van de Pol1,3

<sup>1</sup> Formal Methods and Tools, University of Twente, Enschede, The Netherlands t.vandijk@utwente.nl

<sup>2</sup> Formal Models and Verification, Johannes Kepler University, Linz, Austria <sup>3</sup> Department of Computer Science, University of Aarhus, Aarhus, Denmark

**Abstract.** Saturation is an efficient exploration order for computing the set of reachable states symbolically. Attempts to parallelize saturation have so far resulted in limited speedup. We demonstrate for the first time that on-the-fly symbolic saturation can be successfully parallelized at a large scale. To this end, we implemented saturation in Sylvan's multicore decision diagrams used by the LTSmin model checker.

We report extensive experiments, measuring the speedup of parallel symbolic saturation on a 48-core machine, and compare it with the speedup of parallel symbolic BFS and chaining. We find that the parallel scalability varies from quite modest to excellent. We also compared the speedup of on-the-fly saturation and saturation for pre-learned transition relations. Finally, we compared our implementation of saturation with the existing sequential implementation based on Meddly.

The empirical evaluation uses Petri nets from the model checking contest, but thanks to the architecture of LTSmin, parallel on-the-fly saturation is now available to multiple specification languages. Data or code related to this paper is available at: [34].

### **1 Introduction**

Model checking is an exhaustive algorithm to verify that a finite model of a concurrent system satisfies certain temporal properties. The main challenge is to handle the large state space, resulting from the combination of parallel components. Symbolic model checking exploits regularities in the set of reachable states, by storing this set concisely in a decision diagram. In asynchronous systems, transitions have locality, i.e. they affect only a small part of the state vector. This locality is exploited in the saturation strategy, which is probably the most efficient strategy to compute the set of reachable states.

T. van Dijk—Supported by FWF, NFN Grant S11408-N23 (RiSE).

J. Meijer—Supported by STW SUMBAT Grant 13859.

c The Author(s) 2019 T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 58–75, 2019. https://doi.org/10.1007/978-3-030-17465-1\_4

In this paper, we investigate the efficiency and speedup of a new parallel implementation of saturation, aiming at a multi-core, shared-memory implementation. The implementation is carried out in the parallel decision diagram framework Sylvan [16], in the language-independent model checker LTSmin [22]. We empirically evaluate the speedup of parallel saturation on Petri nets from the Model Checking Contest [24], running the algorithm on up to 48 cores.

#### **1.1 Related Work**

The saturation strategy has been developed and improved by Ciardo et al. We refer to [13] for an extensive description of the algorithm. Saturation derives its efficiency from firing all local transitions that apply at a certain level of the decision diagram, before proceeding to the next higher level. An important step in the development of the saturation algorithm allows on-the-fly generation of the transition relations, without knowing the cardinality of the state variable domains in advance [12]. This is essential to implement saturation in LTSmin, which is based on the PINS interface to discover transitions on-the-fly.

Since saturation obtains its efficiency from a restrictive firing order, it seems inherently sequential. Yet the problem of parallelising saturation has been studied intensively. The first attempt, Saturation NOW [9], used a network of PCs. This version could exploit the collective memory of all PCs, but due to the sequential procedure, no speedup was achieved. By firing local transitions speculatively (but with care to avoid memory waste), some speedup has been achieved [10]. More relevant to our work is the parallelisation of saturation for a shared memory architecture [20]. The authors used CILK to schedule parallel work originating from firing multiple transitions at the same level. They reported some speedup on a dual-core machine, at the expense of a serious memory increase. Their method also required to precompute the transition relation. An improvement of the parallel synchronisation mechanism was provided in [31]. They reported a parallel speedup of 2× on 4 CPUs. Moreover, their implementation supports learning the transition relation on-the-fly. Still, the successful parallelisation of saturation remained widely open, as indicated by Ciardo [14]: "Parallel symbolic state-space exploration is difficult, but what is the alternative?"

For an extensive overview of parallel decision diagrams on various hardware architectures, see [15]. Here we mention some other approaches to parallel symbolic model checking, different from saturation for reachability analysis. First, Grumberg and her team [21] designed a parallel BDD package based on vertical partitioning. Each worker maintains its own sub-BDD. Workers exchange BDD nodes over the network. They reported some speedup on 32 PCs for BDD based model checking under the BFS strategy. The Sylvan [16] multi-core decision diagram package supports symbolic on-the-fly reachability analysis, as well as bisimulation minimisation [17]. Oortwijn [28] experimented with a heterogeneous distributed/multi-core architecture, by porting Sylvan's architecture to RDMA over MPI, running symbolic reachability on 480 cores spread over 32 PCs and reporting speedups of BFS symbolic reachability up to 50. Finally, we mention some applications of saturation beyond reachability, such as model checking CTL [32] and detecting strongly connected components to detect fair cycles [33].

#### **1.2 Contribution**

Here we show that implementing saturation on top of the multi-core decision diagram framework Sylvan [16] yields a considerable speedup in a shared-memory setting of up to 32.5× on 48 cores with pre-learned transition relations, and 52.2× with on-the-fly transition learning.

By design decision, our implementation reuses several features provided by Sylvan, such as: its own fine-grained, work-stealing framework Lace [18], its implementation of both BDDs (Binary Decision Diagrams) and LDDs (a Listimplementation of Multiway Decision Diagrams), its concurrent unique table and operations cache, and finally, its parallel operations like set union and relational product. As a consequence, the pseudocode of the algorithm and additional code for saturation is quite small, and orthogonal to other BDD features. To improve orthogonality with the existing decision diagrams, we deviated from the standard presentation of saturation [13]: we never update BDD nodes in situ, and we eliminated the mutual recursion between saturation and the BDD operations for relational product to fire transitions.

The implementation is available in the open-source high-performance model checking tool LTSmin [22], with its language-agnostic interface, Partitioned Next-State Interface (PINS) [5,22,25]. Here, a specification basically provides a next-state function equipped with dependency information, from which LTSmin can derive locality information. We fully support the flexible method of learning the transition relation on-the-fly during saturation [12]. As a consequence, our contribution extends the tool LTSmin with saturation for various specification languages, like Promela, DVE, Petri nets, mCRL2, and languages supported by the ProB model checker. See Sect. 4 on how to use saturation in LTSmin.

The experiments with saturation in Sylvan are carried out in LTSmin as well. We used Petri nets from the MCC competition. Our experimental design has been carefully set up in order to facilitate fair comparisons. Besides learning the transition relation on-the-fly, we also pre-learned them in order to measure the overhead of learning, and eliminating its effect in comparisons. It is well known that the variable ordering has a large effect on the BDD sizes [29]. Hence, our experiments are based on two of the best static variable orderings known, Sloan [26] and Force [1]. In particular, our experiments measure and compare:


### **2 Preliminaries**

This paper proposes an algorithm for decision diagrams to perform the fixed point application of multiple transition relations according to the saturation strategy, combined with on-the-fly transition learning as implemented in LTSmin. We briefly review these concepts in the following.

#### **2.1 Partitioned Transition Systems**

A transition system (TS) is a tuple (*S,*→*, s*<sup>0</sup>), where *<sup>S</sup>* is a set of states, →⊆ *<sup>S</sup>* <sup>×</sup> *<sup>S</sup>* is a transition relation and *<sup>s</sup>*<sup>0</sup> <sup>∈</sup> *<sup>S</sup>* is the initial state. We define <sup>→</sup><sup>∗</sup> to be the reflexive and transitive closure of →. The set of reachable states is *<sup>R</sup>* <sup>=</sup> {*<sup>s</sup>* <sup>∈</sup> *<sup>S</sup>* <sup>|</sup> *<sup>s</sup>*<sup>0</sup> <sup>→</sup><sup>∗</sup> *<sup>s</sup>*}. The goal of this work is to compute *<sup>R</sup>* via a novel multi-core saturation strategy.

In this paper, we evaluate multi-core saturation using Petri nets. Figure 1 shows an example of a (safe) Petri net. We show its initial marking, which is the initial state. A Petri net transition can fire if there is a token in each of its source places. On firing, these tokens are consumed and tokens in each target place are generated. For example, *t*<sup>1</sup> will produce one token in both *p*<sup>2</sup> and *p*5, if there is a token in *p*4. Transition *t*<sup>6</sup> requires a token in both *p*<sup>3</sup> and *p*<sup>1</sup> to fire. The markings of this Petri net form the states of the corresponding TS, so here <sup>|</sup>*S*<sup>|</sup> = 2<sup>5</sup> = 32. From the initial marking shown, four more markings are reachable, connected by 10 enabled transition firings. This means |*R*| = 5, and |→| = 10.

Notice that transitions in Petri nets are quite local; transitions consume from, and produce into relatively few places. The firing of a Petri net transition is called an event and the number of involved places is known as the *degree of event locality*. This notion is easily defined for other asynchronous specification languages and can be computed by a simple control flow graph analysis.

**Fig. 1.** Example Petri net

To exploit event locality, saturation requires a disjunctive partitioning of the transition relation →, giving rise to a Partitioned Transition System (PTS). In a PTS, states are vectors of length *N*, and → is partitioned as a union of *M* transition groups. A natural way to partition a Petri net is by viewing each transition as a transition group. For Fig. 1 this means we have *N* = 5 and *M* = 6. After disjunctive partitioning, each transition group depends on very few entries of the state vector. This allows for efficiently computing the reachable state space for the large class of asynchronous specification languages. LTSmin supports commonly used specification languages, like DVE, mCRL2, Promela, PNML for Petri nets, and languages supported by ProB.

**Fig. 2.** LDD for {-0*,* 0*,*-0*,* 2*,*-0*,* 4*,*-1*,* 0*,*-1*,* 2*,*-1*,* 4*,*-3*,* 2*,*-3*,* 4*,*-5*,* 0*,*-5*,* 1*,*-6*,* 1}.

#### **2.2 Decision Diagrams**

Binary decision diagrams (BDDs) are a concise and canonical representation of Boolean functions <sup>B</sup>*<sup>N</sup>* <sup>→</sup> <sup>B</sup> [7]. A BDD is a rooted directed acyclic graph with leaves 0 and 1. Each internal node *v* has a variable label *xi*, denoted by var(*v*), and two outgoing edges labeled 0 and 1, denoted by low(*v*) and high(*v*). The efficiency of *reduced, ordered* BDDs is achieved by minimizing the structure with some invariants: The BDD may neither contain *equivalent nodes*, with the same var(*v*), low(*v*) and high(*v*), nor *redundant nodes*, with low(*v*) = *high*(*v*). Also, the variables must occur according to a fixed ordering along each path.

Multi-valued or multiway decision diagrams (MDDs) generalize BDDs to finite domains (N*<sup>N</sup>* <sup>→</sup> <sup>B</sup>). Each internal MDD node with variable *<sup>x</sup><sup>i</sup>* now has *n<sup>i</sup>* outgoing edges, labeled 0 to *n<sup>i</sup>* − 1. We use quasi-reduced MDDs with sparse nodes. In the sparse representation, values with edges to leaf 0 are skipped from MDD nodes, so outgoing edges must be explicitly labeled with remaining domain values. Contrary to BDDs, MDDs are usually "quasi-reduced", meaning that variables are never skipped. In that case, the variable *x<sup>i</sup>* can be derived from the depth of the MDD, so it is not stored.

A variation of MDDs are list decision diagrams (LDDs) [5,16], where sparse MDD nodes are represented as a linked list. See Fig. 2 for two visual representations of the same LDD. Each LDD node contains a value, a "down" edge for the corresponding child, and a "right" edge pointing to the next element in the list. Each list ends with the leaf 0 and each path from the root downwards ends with the leaf 1. The values in an LDD are strictly ordered, i.e., the values must increase to the "right".

LDD nodes have the advantage that common suffixes can be shared: The MDD for Fig. 2a requires two more nodes, one for [2*,* 4] and one for [1], because edges can only point to an entire MDD node. LDDs suffer from an increased memory footprint and inferior memory locality, but their memory management is simpler, since each LDD node has a fixed small size.


**Fig. 3.** Dependency matrices of Fig. 1.

#### **2.3 Variable Orders and Event Locality**

Good variable orders are crucial for efficient operations on decision diagrams. The syntactic variable order from the specification is often inadequate for the saturation algorithm to perform well. Hence, finding a good variable order is necessary. Variable reordering algorithms use heuristics based on event locality. The locality of events can be illustrated with dependency matrices. The size of those matrices is *M* × *N*, where *M* is the number of transition groups, and *N* is the length of the state vector. The order of columns in dependency matrices determines the order of variables in the DD. Figure 3a shows the natural order on places in Fig. 1. A measure of event locality is called *event span* [29]. Lower event span is correlated to a lower number of nodes in decision diagrams. This can be seen in LDDs in Figs. 4a and b that are ordered according to columns in Figs. 3a and b respectively.

Event span is defined as the sum over all rows of the distance from the leftmost non-zero column to the rightmost non-zero column. The event span of Fig. 3a is 22 (= 4+2+2+5+5+4); the event span of Fig. 3b is 16, which is better. Optimizing the event span and thus variable order of DDs is NP-complete [6], yet there are heuristic approaches that run in subquadratic time and provide good enough orders. Commonly used algorithms are Noack [27], Force [1] and Sloan [30]. Noack creates a permutation of variables by iteratively minimizing some objective function. The Force algorithm acts as if there are springs in between nonzeros in the dependency matrix, and tries to minimize the average tension among them. Sloan tries to minimize the profile of matrices. In short, profile is

**Fig. 4.** Reachable states as LDDs with different orders on places

the symmetric counterpart to event span. For a more detailed overview of these algorithms see [3]. In our empirical evaluation we use both Sloan and Force, because these have been shown to give the best results [2,26].

#### **2.4 The Saturation Strategy**

The saturation strategy for reachability analysis, i.e., the transitive closure of transition relations applied to some set of states, was first proposed by Ciardo et al. See for an overview [11,13]. Saturation was combined with on-the-fly transition learning in [12]. Besides reachability, saturation has also been applied to CTL model checking [32] and in checking fairness constraints with strongly connected components [33].

Saturation is well-studied. The core idea is to always fire enabled transitions at the lower levels in the decision diagram, before proceeding to the next level. This tends to keep the intermediate BDD sizes much smaller than for instance the breadth-first exploration strategy. This is in particular the case for asynchronous systems, where transitions exhibit locality. There is also a major influence from the variable reordering: if the variables involved in a transition are grouped together, then this transition only affects adjacent levels in the decision diagram.

We refer to [13] for a precise description of saturation. Our implementation deviates from the standard presentation in three ways. First, we implemented saturation for LDDs and BDDs, instead of MDDs. Next, we never update nodes in the LDD forest in situ; instead, we always create new nodes. Finally, the standard representation has a mutual recursion between *saturation* and *firing transitions*. Instead, we fire transition using the existing function for relational product, which is called from our saturation algorithm. As a consequence, the extension with saturation becomes more orthogonal to the specific decision diagram implementation. We refer to Sect. 3 for a detailed description of our algorithm. We show in Sect. 5 that these design decisions do not introduce computational overhead.

### **3 Multi-core Saturation Algorithm**

To access the three elements of an LDD node *x*, Sylvan [16] provides the functions value(*x*), down(*x*), right(*x*). To create or retrieve a unique LDD node using the hash table, Sylvan provides LookupLDDNode(*value, down, right*).

Furthermore, Sylvan provides several operations on LDDs that we use to implement reachability algorithms, such as union(*A, B*) to compute the set union *A* ∪ *B* and minus(*A, B*) to compute the set difference *A* \ *B*. For transition relations, Sylvan provides an operation relprod(*S, R*) to compute the successors of *S* with transition relation *R*, and an operation relprodunion(*S, R*) that computes union(*S,* relprod(*S, R*)), i.e., computing the successors and adding them to the given set of states, in one operation. All these operations are internally parallelized, as described in [16].

We implement multi-core saturation as in Algorithm 1. We have a transition relation disjunctively partitioned into *M* relations *R*<sup>0</sup> *...R<sup>M</sup>*−<sup>1</sup>. These relations are sorted by the level (depth) of the decision diagram where they are applied, which is the first level touched by the relation. We say that relation *R<sup>i</sup>* is applied

**global:** *M* transition relations *R*<sup>0</sup> *...R<sup>M</sup>*−<sup>1</sup> starting at depths *d*<sup>0</sup> *...d<sup>M</sup>*−<sup>1</sup>

```
1 def saturate(S, k, d):
2 if S = 0 ∨ S = 1 : return S
3 if k = M : return S
4 if result ← cache[(S, k, d)] : return result
5 if d = dk :
6 k-
          ← next relation k<k-
                             < M where dk-
                                          = d, or M
7 while S changes :
8 S ← saturate(S, k-

                            , d)
9 for i ∈ [k, k-

                     ) : S ← relprodunion(S, Ri)
10 result ← S
11 else:
12 do in parallel:
13 right ← saturate(right(S), k, d)
14 down ← saturate(down(S), k, d + 1)
15 result ← LookupLDDNode(value(S), down, right)
16 cache[(S, k, d)] ← result
17 return result
```
**Algorithm 1:** The multi-core saturation algorithm, which, given a set of states *S* and next transition relation *k* and current decision diagram depth *d*, exhaustively applies all transition relations *R<sup>k</sup> ...R<sup>M</sup>*−<sup>1</sup> using the saturation strategy. at depth *di*. We identify the current next relation with a number *k*, 0 ≤ *k* ≤ *M*, where *k* = *M* denotes "no next relation". Decision diagram levels are sequentially numbered with 0 for the root level.

The saturate algorithm is given the initial set of states S and the initial next transition relation *k* = 0 and the initial decision diagram level *d* = 0. The algorithm is a straightforward implementation of saturation. First we check the easy cases where we reach either the end of an LDD list, where *S* = 0, or the bottom of the decision diagram, where *S* = 1. If there are no more transition relations to apply, then *k* = *M* and we can simply return *S*. When we arrive at line 4, the operation is not trivial and we consult the operation cache.

If the result of this operation was not already in the cache, then we check whether we have relations at the current level. Since the relations are sorted by the level where they must be applied, we compare the current level *d* with the level *d<sup>k</sup>* of the next relation *k*. If we have relations at the current level, then we perform the fixed point computation where we first saturate *S* for the remaining relations, starting at relation *k* , which is the first relation that must be applied on a deeper level than *d*, and then apply the relations of the current level, that is, all *R<sup>i</sup>* where *k* ≤ *i<k* . If no relations match the current level, then we compute in parallel the results of the suboperations for the LDD of successor "right" and for the LDD of successor "down". After obtaining these sub results, we use LookupLDDNode to compute the final result for this LDD node. Finally, we store this result in the operation cache and return it.

The **do in parallel** keyword is implemented with the work-stealing framework Lace [18], which is embedded in Sylvan [16] and offers the primitives spawn and sync to create subtasks and wait for their completion. The implementation using spawn and sync of lines 12–14 is as follows.

```
12 spawn(saturate(right(S), k, d))
13 down ← saturate(down(S), k, d + 1)
14 right ← sync()
```
The implementation of multi-core saturation for BDDs is identical, except that we parallelize on the "then" and "else" successors of a BDD node, instead of on the "down" and "right" successors of an LDD node.

To add on-the-fly transition relation learning to this algorithm, we simply modify the loop at line 9 as follows:

```
9 for i ∈ [k, k-

              ) :
10 learn-transitions(S, i, d)
11 S ← relprodunion(S, Ri)
```
The learn-transitions function provided by LTSmin updates relation *i* given a set of states *S*. The function first restricts *S* to so-called short states *S<sup>i</sup>* , which is the projection of *S* on the state variables that are touched by relation *i*. Then it calls the next-state function of the PINS interface for each new short state and it updates *R<sup>i</sup>* with the new transitions.

Updating transition relations from multiple threads is not completely trivial. LTSmin solves this using lock-free programming with the compare-and-swap operation. After collecting all new transitions, LTSmin computes the union with the known transitions and uses compare-and-swap to update the global relation; if this fails, the union is repeated with the new known transitions.

### **4 Contributed Tools**

We present several new tools and extensions to existing tools produced in this work. The new tools support experiments and comparisons between various DD formats. The extension to Sylvan and LTSmin provides end-users with multicore saturation for reachability analysis.

#### **4.1 Tools for Experimental Purposes**

For the empirical evaluation, we need to isolate the reachability analysis of a given LDD (or BDD or MDD). To that end, we implemented three small tools that only compute the set of reachable states, namely lddmc for LDDs, bddmc for BDDs and medmc for MDDs using the library Meddly. These tools are given an input file representing the model, compute the set of reachable states, and report the number of states and the required time to compute all reachable states. Additionally we provide the tools ldd2bdd and ldd2meddly that convert an LDD file to a BDD file and to an MDD file. The LDD input files are generated using LTSmin (see below). These tools can all be found online<sup>1</sup>.

#### **4.2 Tools for On-The-Fly Multi-core Saturation**

On-the-fly multi-core saturation is implemented in the LTSmin toolset, which can be found online<sup>2</sup>. The examples in this section are also online<sup>3</sup>. On-the-fly multicore saturation for Petri nets is available in LTSmin's tool pnml2lts-sym. This tool computes all reachable markings with parallel saturation. The command line to run it on Fig. 1 is pnml2lts-sym pnml/example.pnml --saturation=sat. The tool reports: pnml2lts-sym: state space has 5 states, 16 nodes. Additionally, it appears the final LDD has 16 nodes.

Here the syntactic variable order of the places in pnml/example.pnml is used. To use a better variable order, the option -r is added to the command line. For instance adding -rf runs *Force*, while -rbs runs *Sloan*'s algorithm (as implemented in the well-known Boost library). Running pnml2lts-sym pnml/example.pnml --saturation=sat -rf reports that the final LDD has only 12 nodes.

The naming convention of LTSmin's binaries follows the Partitioned Next-State Interface (PINS) architecture [5,22,25]. PINS forms a bridge between several language front-ends and algorithmic back-ends. Consequently, besides

<sup>1</sup> https://github.com/trolando/sylvan.

<sup>2</sup> https://github.com/utwente-fmt/ltsmin.

<sup>3</sup> https://github.com/trolando/ParallelSaturationExperiments.

pnml2lts-sym, LTSmin also provides {pnml,dve,prom}2lts-{dist,mc,sym} and several other combinations. These binaries generate the state space for the languages PNML, DVE and Promela, by means of distributed explicit-state, multicore explicit-state and multi-core symbolic algorithms, respectively. Additionally, LTSmin supports checking for deadlocks and invariants, and verifying LTL properties and *µ*-calculus formulas. In this work we focus on state space generation with the symbolic back-end only.

We now demonstrate multi-core saturation for Promela models. Consider the file Promela/garp 1b2a.prm which is an implementation of the GARP protocol [23]. To compute the reachable state space with the proposed algorithm and Force order, run: prom2lts-sym --saturation=sat Promela/garp 1b2a.prm -rf. On a consumer laptop with 8 hardware threads, LTSmin reports 385,000,995,634 reachable states within 1 min. To run the example with a single worker, run prom2lts-sym –saturation=sat Promela/garp 1b2a.prm -rf --lace-workers=1. On the same laptop, the algorithm runs in 4 min with 1 worker. We thus have a speedup of 4× with 8 workers for symbolic saturation on a Promela model.

### **5 Empirical Evaluation**

Our goal with the empirical study is five-fold. *First*, we compare our parallel implementation with only 1 core to the purely sequential implementation of the MDD library Meddly [4], in order to determine whether our implementation is competitive with the state-of-the-art. *Second*, we study parallel scalability up to 16 cores for all models and up to 48 cores with a small selection of models. *Third*, we compare parallel saturation with LDDs to parallel saturation with ordinary BDDs, to see if we get similar results with BDDs. *Fourth*, we compare parallel saturation without on-the-fly transition learning to on-the-fly parallel saturation, to see the effects of on-the-fly transition learning on the performance of the algorithm. *Fifth*, we compare parallel saturation with other reachability strategies, namely chaining and BFS, to confirm whether saturation is indeed a better strategy than chaining and BFS.

To perform this evaluation, we use the P/T Petri net benchmarks obtained from the Model Checking Contest 2016 [24]. These are 491 models in total, stored in PNML files. We use parallel on-the-fly saturation (in LTSmin) with a generous timeout of 1 hour to obtain LDD files of the models, using the Force variable ordering and using the Sloan variable ordering. In total, 413 of potentially 982 LDD files were generated. These LDD files simply store the list decision diagrams of the initial states and of all transition relations. We convert the LDD files to BDD files (binary decision diagrams) with an optimal number of binary variables. We also convert the LDD files to MDD files for the experiments using Meddly. This ensures that all solvers have *the same input model with the same variable order*.


**Table 1.** The six solving methods that we use in the empirical evaluation. Five methods are parallelized and one method is on-the-fly.

**Table 2.** Number of benchmarks (out of 413) solved within 20 min with each method with the given number of workers.


See Table 1 for the list of solving methods. As described in Sect. 4, we implement the tools lddmc, bddmc and medmc to isolate reachability computation for the purposes of this comparison, using respectively the LDDs and BDDs of Sylvan and the MDDs of Meddly. The on-the-fly parallel saturation using LDDs is performed with the pnml2lts-sym tool of LTSmin. We use the command line pnml2lts-sym ORDER --lace-workers=WORKERS --saturation=sat FILE, where ORDER is -rf for Force and -rbs for Sloan and WORKERS is a number from the set {1*,* 2*,* 4*,* 8*,* 16}.

All experimental scripts, input files and log files are available online (see footnote 3). The experiments are performed on a cluster of Dell PowerEdge M610 servers with two Xeon E5520 processors and 24 GB internal memory each. The tools are compiled with gcc 5.4.0 on Ubuntu 16.04. The experiments for up to 48 cores are performed on a single computer with 4 AMD Opteron 6168 processors with 12 cores each and 128 GB internal memory.

When reporting on parallel executions, we use *the number of workers* for how many hardware threads (cores) were used.

*Overview.* After running all experiments, we obtain the results for 413 models in total, of which 196 models with the Force variable ordering and 217 models with the Sloan variable ordering. In the remainder of this section, we study these


**Table 3.** Cumulative time and parallel speedups for each method-#workers combination on the models where all methods solved the model in time. These are 301 models in total: 151 models with Force, 150 models with Sloan.

413 benchmarks. See Table 2, which shows the number of models for which each method could compute the set of reachable states within 20 min.

To correctly compare all runtimes, we restrict the set of models to those where all methods finish within 20 min with any number of workers. We retain in total 301 models where no solver hit the timeout. See Table 3 for the cumulative times for each method and number of workers and the parallel speedup. Notice that this is the speedup for the *entire* set of 301 models and not for individual models.

*Comparing LDD saturation with Meddly's saturation.* We evaluate how ldd-sat with just 1 worker compares to the sequential saturation of Meddly. The goal is not to directly measure whether there is a parallel overhead from using parallelism in Sylvan, as the algorithm in lddmc is fundamentally different because it uses LDDs instead of MDDs and the algorithm does not in-place saturate nodes, as also explained in Sect. 3. The low parallel overheads of Sylvan are already demonstrated elsewhere [15,16,18]. Rather, the goal is to see how our version of saturation compares to the state-of-the-art.

Table 2 shows that Meddly's implementation (mdd-sat) and our implementation (ldd-sat 1) are quite similar in the number of solved models. Meddly solves 375 benchmarks and our implementation solves 388 within 20 min.

See Table 3 for a comparison of runtimes. Meddly solves the 150 models with Sloan almost 2× as fast as our implementation in Sylvan, but is slower than our implementation for the 151 models with Force. We observe for individual models that the difference between the two solvers is within an order of magnitude for


**Table 4.** Parallel speedup for a selection of benchmarks on the 48-core machine (only top 5 shown)

most models, although there are some exceptions. Our implementation quickly overtakes Meddly with additional workers.

*Parallel Scalability.* As shown in Table 3, using 16 workers, we obtain a modest parallel speedup for saturation of 6.2× (with Sloan) and 4.7× (with Force). On individual models, the differences are large. The average speedup of the individual benchmarks is only 1.8× with 16 workers, but there are many slowdowns for models that take less than a second with 1 worker. We take an arbitrary selection of models with a high parallel speedup and run these on a dedicated 48-core machine. Table 4 shows that even up to 48 cores, parallel speedup keeps improving. We even see a speedup of 52.2×. For this superlinear speedup we have two possible explanations. One is that there is some nondeterminism inherent in any parallel computation; another is already noted in [20] and is related to the "chaining" in saturation, see further [20].

*Comparing LDD saturation with BDD saturation.* As Table 3 shows, the ldd-sat and bdd-sat method have a similar performance and similar parallel speedups.

*On-the-fly LDD saturation.* Comparing the performance of offline saturation with on-the-fly saturation, we observe the same scalability with the Sloan variable order, but on-the-fly saturation requires roughly 2× as much time. With the Force variable order, on-the-fly saturation is slower but has a higher parallel speedup of 7.9×.

*Comparing saturation, chaining and BFS.* We also compare the saturation algorithm with other popular strategies to compute the set of reachable states,

```
global : N transition relations R0 ...RM−1
```

```
1 def bfs(S):
2 U ← S
3 while U = ∅ :
4 U ← par-next(U, 0, M)
5 U ← minus(U, S)
6 S ← union(U, S)
7 return S
8 def par-next(S, i, k):
9 if k = 1 : return relprod(S, Ri)
10 do in parallel:
11 left ← par-next(S, i, k/2)
12 right ← par-next(S, i + k/2, k − k/2)
13 return union(left, right)
                                 1 def chaining(S):
                                 2 U ← S
                                 3 while U = ∅ :
                                 4 for i ∈ [0, M) :
                                 5 U ← relprodunion(U, Ri)
                                 6 U ← minus(U, S)
                                 7 S ← union(U, S)
                                 8 return S
```
**Fig. 5.** Algorithms bfs and chaining implement the Parallel BFS and Chaining strategies for reachability.

namely standard (parallelized) BFS and chaining, given in Fig. 5. As Tables 2 and 3 show, chaining is significantly faster than BFS and saturation is again significantly faster than chaining. In terms of parallel scalability, we see that parallelized BFS scales better than the others, because it can already parallelize in the main loop by computing successors for all relations in parallel, which chaining and saturation cannot do. For the entire set of benchmarks, saturation is the superior method, however there are individual differences and for some models, saturation is not the fastest method.

### **6 Conclusion**

We presented a multi-core implementation of saturation for the efficient computation of the set of reachable states. Based on Sylvan's multi-core decision diagram framework, the design of the saturation algorithm is mostly orthogonal to the type of decision diagram. We showed the implementation for BDDs and LDDs; the translation relation can be learned on-the-fly. The functionality is accessible through the LTSmin high-performance model checker. This makes parallel saturation available for a whole collection of asynchronous specification languages. We demonstrated multi-core saturation for Promela and for Petri nets in PNML representation.

We carried out extensive experiments on a benchmark of Petri nets from the Model Checking Contest. The total speedup of on-the-fly saturation is 5.9× on 16 cores with the Sloan variable ordering and 7.9× with the Force variable ordering. However, there are many small models (computed in less than a second) in this benchmark. For some larger models we showed an impressive 52× speedup on a 48-core machine. From our measurements, we further conclude that the efficiency and parallel speedup for the BDD variant is just as good as the speedup for LDDs. We compared efficiency and speedup of saturation versus other popular exploration strategies, BFS and chaining. As expected, saturation is significantly faster than chaining, which is faster than BFS; this trend is maintained in the parallel setting. Our measurements show that the variable ordering (Sloan versus Force), and the model representation (pre-computed transition relations versus learned on-the-fly) do have an impact on efficiency and speedup. Parallel speedup should not come at the price of reduced efficiency. To this end, we compared our parallel saturation algorithm for one worker to saturation in Meddly. Meddly solves fewer models within the timeout, but is slightly faster in other cases, but parallel saturation quickly overtakes Meddly with multiple workers.

Future work could include the study of parallel saturation on exciting new BDD types, like tagged BDDs and chained BDDs [8,19]. The results on tagged BDDs showed a significant speedup compared to ordinary BDDs on experiments in LTSmin with the BEEM benchmark database. Another direction would be to investigate the efficiency and speedup of parallel saturation in other applications, like CTL model checking, SCC decomposition, and bisimulation reduction.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Monitoring and Runtime Verification

# **Specification and Efficient Monitoring Beyond STL**

Alexey Bakhirkin(B) and Nicolas Basset

Univ. Grenoble Alpes, CNRS, Grenoble INP, VERIMAG, 38000 Grenoble, France abakhirkin@gmail.com

**Abstract.** An appealing feature of Signal Temporal Logic (STL) is the existence of efficient monitoring algorithms both for Boolean and realvalued robustness semantics, which are based on computing an aggregate function (conjunction, disjunction, min, or max) over a sliding window. On the other hand, there are properties that can be monitored with the same algorithms, but that cannot be directly expressed in STL due to syntactic restrictions. In this paper, we define a new specification language that extends STL with the ability to produce and manipulate real-valued output signals and with a new form of until operator. The new language still admits efficient offline monitoring, but also allows to express some properties that in the past motivated researchers to extend STL with existential quantification, freeze quantification, and other features that increase the complexity of monitoring.

### **1 Introduction**

Signal Temporal Logic (STL [16,17]) is a temporal logic designed to specify properties of real-valued dense-time signals. It gained popularity due to the rigour and the ability to reason about analog and mixed signals; and it found use in such domains as analog circuits, systems biology, cyber-physical control systems (see [3] for a survey). A major use of STL is in monitoring: given a signal and an STL formula, an automated procedure can decide whether the formula holds at a given time point.

Monitoring of STL is reliably efficient. A monitoring procedure typically traverses the formula bottom up, and for every sub-formula computes a satisfaction signal, based on satisfaction signals of its operands. Boolean monitoring is based on the computation of conjunctions and disjunctions over a sliding window ("until" is implemented using a specialized version of running conjunction), and robustness monitoring (computing how well a signal satisfies a formula [9,10]) is based on the computation of minimum and maximum over a sliding window. The complexity of both Boolean and robustness monitoring is linear in the length of the signal and does not depend on the width of temporal windows appearing in

This work was partially supported by the European Research Council under the European Union's Seventh Framework Programme (FP/2007-2013)/ERC Grant Agreement nr. 306595 "STATOR".

the formula. At the same time, for a range of applications, pure STL is either not expressive enough or difficult to use, and specifying a desired property often becomes a puzzle of its own. The existence of robustness and other real-valued semantics does not always help, since a monitor can perform a limited set of operations that the semantics assigns to Boolean operators. For example, for robustness semantics, min and max are the only operations beyond the atomic proposition level.

One way to work around the expressiveness issues of STL is pre-processing: a computation that cannot be performed by an STL monitor can be performed by a pre-processor and supplied as an extra input signal. For a number of reasons, this is not always satisfactory. First, for monitoring of continuous-time signals, there is a big gap between the logical definitions of properties and the implementation of monitors. In continuous-time setting, properties are defined using quantification, upper and lower bounds, and similar mathematical tools for dense sets, while a monitor works with a finite piecewise representation of a signal and performs a computation that is based on induction and other tools for discrete sets. Leaving this gap exposed to the user, who has to implement the pre-processing step, is not very user-friendly. Second, monitoring of some properties cannot be cleanly decomposed into a pre-processing step followed by standard STL monitoring. Later, we give a concrete example using an extended "until" operator, and for now, notice that "until" instructs the monitor to compute a conjunction over the window that is not fixed in advance, but is defined by its second operand. Because of that, multiple researches have been motivated to search for a more expressive superset of STL that would allow to specify the properties they were interested in.

One direction for extension is to add to the original quantifier-free logic (MTL, STL) a form of variable binding: a freeze quantifier as in STL\* [6], a clock reset as in TPTL [1], or even first order quantification [2]. Unfortunately, such extensions are detrimental to complexity of monitoring. When monitoring logics with quantifiers using standard bottom-up approach, subformulas containing free variables evaluate not to Boolean- or real-valued signals, but to maps from time to non-convex sets, and they cannot in general be efficiently manipulated (although for some classes of formulas monitoring of logics with quantifiers works well [4,13]). Perhaps the most benign in this respect but also least expressive extension is 1-TPTL (TPTL with one active clock), which is as expressive as MITL, but is easier to use and admits a reasonably efficient monitoring procedure [11].

An alternative direction is to define a quantifier-free specification language with more flexible syntax and sliding window operations. For example, Signal Convolution Logic (SCL [20]) allows to specify properties using convolution with a set of select kernels. In particular, it can express properties of the form "statement ϕ holds on an interval for at least X% of the time". In SCL, every formula has a Boolean satisfaction signal, but some works go further and allow a formula to produce a real-valued output signal based on the real-valued signals of its subformulas. This already happens for robustness of STL in a very limited way, and can be extended. For example, [19] presents temporal logic monitoring as filtering, which allows to derive multiple different real-valued semantics. Another work [7] focuses on the practical application of robustness in falsification and allows to choose between different possible robust semantics for "eventually" and "always", in particular to replace min or max with integration where necessary.

This paper is our take on extending STL in the latter direction. We define a specification language that is more expressive than STL, but not less efficient to monitor offline, i.e., the complexity of monitoring is linear in the length of the signal and does not depend on the width of temporal windows in the formula (the latter property tends to be missing from the STL extensions, even when the authors can achieve linear complexity for a fixed formula). The most important features of the new language are as follows.


Finally, we focus our attention on continuous-time piecewise-constant and piecewise linear signals; we describe the algorithms and prepare an implementation only for piecewise-constant.

### **2 Motivating Examples**

Before formally defining the new language, let us look at some examples of properties that we would like to express. In particular, we look at properties that motivated the development of more expressive and harder to monitor logics.

**Example 1 (Stabilization).** The first interesting property is stabilization around a value that is not known in advance, e.g., "*x* stays within 0.05 units of some value for at least 200 time units". It is tempting, to formalize this property using existential quantification "there exists a threshold v, such that. . . ", which is possible with first-order logic of signals (and was one of its motivational properties [2]), but it is actually not necessary. Instead, we can compute the minimum and maximum of *x* over the next 200 time units and compare their distance to 0.<sup>1</sup> <sup>=</sup> <sup>2</sup> · <sup>0</sup>.05. In some imaginary language, we could write max[0,200] *<sup>x</sup>* <sup>−</sup> min[0,200] *<sup>x</sup>* <sup>≤</sup> <sup>0</sup>.1. At this point we propose to separate the aggregate operators from the operator that defines the temporal window, which will be useful later, when the "until" operator will define a window of variable width. We use the operator On[a,b] to define the temporal window of constant width and the operators Min and Max (capitalized) to denote the minimum and maximum over the previously defined window. *Signal x stabilizes within* 0.05 *units of an unknown value for* 200 *time units:*

$$for \ 200\ time \ units: \\\\ On\_{[0,200]} \text{ Max x-On}\_{[0,200]} \text{ Min x} \le 0.1.$$

Figure <sup>1</sup> shows an example of a signal *<sup>x</sup>*(*t*) (red) performing damped oscillation with the period of 250 time units. Blue and green curves are the maximum and the minimum of *<sup>x</sup>* over a siding window [*t*, *<sup>t</sup>* <sup>+</sup> <sup>200</sup>]. Finally, the orange Boolean signal (its y scale is on the right) evaluates to true (i.e., y <sup>=</sup> 1) when the maximum and minimum of *x* over the next 200 time units are within 0.1.

**Example 2 (Local Maximum).** Consider the property: "the current value of *x* is a minimum or maximum in some neighbourhood of current time point". Previously, a similar property became a motivation to extend STL with freeze quantifiers [6], but we can also express it by comparing the value of a signal with some aggregate information about its neighbourhood, which we can do similarly to the previous example.

*Current value of x is a local maximum on the interval* [0, 85] *relative to the current time. <sup>x</sup>* <sup>≥</sup> On[0,85] Max *<sup>x</sup>*

Figure <sup>2</sup> shows an example of a sine wave *<sup>x</sup>*(*t*) (red) with the period of 250 time units. Blue curve is the maximum *<sup>x</sup>* over a siding window [*t*, *<sup>t</sup>* <sup>+</sup> <sup>85</sup>]. The orange Boolean signal evaluates to true when the current value of *x* is a maximum for the next 85 time units.

**Fig. 1.** Damped oscillation *<sup>x</sup>*(*t*) and its maximum and minimum over the window [*t*, *<sup>t</sup>* <sup>+</sup> <sup>200</sup>]. (Color figure online)

**Fig. 2.** Sine wave *<sup>x</sup>*(*t*), its maximum over the window [*t*, *<sup>t</sup>* <sup>+</sup> <sup>200</sup>], and whether *<sup>x</sup>*(*t*) is a local maximum on the interval [*t*, *<sup>t</sup>* <sup>+</sup> <sup>200</sup>]. (Color figure online)

**Example 3 (Stabilization Contd.).** We want to be able to assert that *x* becomes stable around some value not for a fixed duration, but until some signal *q* becomes true. We will be able to do this with our version of "until" operator. *Signal x is stable within* 0.05 *units of an unknown value until q becomes true:* (

Max *<sup>x</sup>* <sup>U</sup> *<sup>q</sup>*)−(Min *<sup>x</sup>* <sup>U</sup> *<sup>q</sup>*) ≤ <sup>0</sup>.<sup>1</sup>

Intuitively, for a given time point, we want the monitor to find the closest future time point, where *q* holds and compute Min and Max of *x* over the resulting interval. Note that this property cannot be easily monitored in the framework of "STL with pre-processing", since it requires the monitor to compute Min and Max over a sliding window of variable width, which depends on the satisfaction signal of *q*.

**Example 4 (Linear Increase).** At this point, we can assert *x* to follow a more complex shape, for example, to increase or decrease with a given slope. Let *T* denote an auxiliary signal that linearly increases with rate 1 (like a clock of a timed automaton), i.e. we define *<sup>T</sup>*(*t*) <sup>=</sup> *<sup>t</sup>*; this example works as well for *<sup>T</sup>*(*t*) <sup>=</sup> *<sup>t</sup>* <sup>+</sup> *<sup>c</sup>*, where *<sup>c</sup>* is a constant. To specify that *<sup>x</sup>* increases with the rate 2.5, we assert that the distance from *<sup>x</sup>* to 2.<sup>5</sup> · T stays within some bounds.

*Signal x increases approximately with slope* 2.5 *during the next* 100 *time units:*

On[0,100] Max <sup>|</sup>*<sup>x</sup>* <sup>−</sup> <sup>2</sup>.5T| − On[0,100] Min <sup>|</sup>*<sup>x</sup>* <sup>−</sup> <sup>2</sup>.5T| ≤ <sup>0</sup>.<sup>1</sup>

#### **3 Syntax and Semantics**

From the examples above we can foresee how the new language looks like. Formally, an *(input) signal* is a function w : <sup>T</sup> <sup>→</sup> <sup>R</sup>n, where the time domain <sup>T</sup> is a closed real interval [0, <sup>|</sup>w|] ⊆ <sup>R</sup>, and the number <sup>|</sup>w<sup>|</sup> is the *duration* of the signal. We refer to signal components using their own letters: *<sup>x</sup>*, y, ··· ∈ <sup>T</sup> <sup>→</sup> <sup>R</sup>. We assume that every signal component is piecewise-constant or piecewise-linear.

The semantics of a formula is a piecewise-constant or piecewise-linear function from real time (thus, has real-valued switching points) to a dual number (rather than a real). We defer the discussion of dual numbers until Sect. 3.2; for now we note that they extend reals, and a dual number can be written in the form *a* + *b*ε, which, when *b* - 0, denotes a point infinitely close to *a*. We denote the set of dual numbers as Rε. Our primary use of a dual number is to represent a time point strictly after an event (switching point, threshold crossing, etc.) but before any other event can happen; as a result we have to allow an output signal to have a dual value, denoting a value that is attained at this dual time point.

**Syntax.** We can write the abstract syntax of our language as follows:

$$\begin{aligned} & \text{We can write the abstract system of our language as follows:} \\ & \varphi ::= c \mid \mathbf{x} \mid f(\varphi\_1 \cdots \varphi\_n) \mid \text{On}\_{[a,b]} \psi \mid \psi \bullet \mathbf{U}\_{[a,b]}^d \varphi \mid \varphi\_1 \downarrow \mathbf{U}\_{[a,b]}^d \varphi\_2 \\ & \psi ::= \text{Min} \, \varphi \mid \text{Max} \, \varphi \end{aligned} \tag{1}$$

[

[

where *c* is a real-valued constant; *x* refers to an input signal; *f* is a real-valued function symbol (e.g., sum, absolute value, etc.); for the On-operator, *a* and *b* can be real numbers or (with some abuse of notation) ±∞, i.e., the interval may refer to both past and future, bounded or unbounded; for the U-operator, *d* is a real value, and *<sup>a</sup>*, *<sup>b</sup>* are non-negative, and *<sup>b</sup>* can be <sup>∞</sup>, i.e., the interval refers to bounded or unbounded future. Let us go over some of the features of the new language and then formally write down its semantics.

**Point-wise Functions.** Function symbol *f* ranges over real-valued functions <sup>R</sup><sup>n</sup> <sup>→</sup> <sup>R</sup> that preserve the chosen shape of signals (and can be lifted to dual numbers). In this paper, we focus on piecewise-constant and piecewise-linear signals, so when *f* is applied point-wise to a piecewise-constant input, we want the result to be piecewise-constant; when *f* is applied point-wise to a piecewise-linear input, we want the result to be piecewise-linear. Examples of such functions are addition, subtraction, min and max of finitely many operands (we use lowercase min and max to denote a real-valued n-ary function), multiplication by a constant, absolute value, etc.

**Boolean Output Signals.** Output signals of some formulas can informally be interpreted as Boolean-valued. In Example 2, "*x*" and "On[0,85] Max *<sup>x</sup>*" are dual-valued, but the result of their comparison, "*<sup>x</sup>* <sup>≥</sup> On[0,85] Max *<sup>x</sup>*" should be interpreted as Boolean. Here, we take the more simple path and treat a Boolean signal as a special case of a real-valued signal that can take the value of 0 or 1. We expect comparison operators to produce a value in {0, <sup>1</sup>}, e.g., <sup>ϕ</sup><sup>1</sup> <sup>≤</sup> <sup>ϕ</sup><sup>2</sup> is a shortcut for "if <sup>ϕ</sup><sup>1</sup> <sup>≤</sup> <sup>ϕ</sup><sup>2</sup> then 1 else 0". Standard Boolean connectives can then be defined as follows:

find fixed as follows:

$$\varphi\_1 \land \varphi\_2 = \min\{\varphi\_1, \varphi\_2\} \qquad \qquad \varphi\_1 \lor \varphi\_2 = \max\{\varphi\_1, \varphi\_2\} \qquad \qquad \neg \varphi = 1 - \varphi\_1$$

Another option would be to distinguish Boolean-valued formulas on the syntactic level.

**Temporal** ϕ**-Formulas.** Symbol ϕ denotes a temporal formula that has a dualvalued output signal. In other words, it can be evaluated at a time point and produces a dual value. A ϕ-formula may:


**Interval** ψ**-Formulas.** A ψ-formula is evaluated on an interval and does not have an output signal by itself. Instead, it supplies an aggregate operation that will be computed when evaluating the containing On-formula or "until"-formula. It should be possible to efficiently compute this aggregate operation over a sliding window, and it should preserve the chosen shape of signals. Since we focus on piecewise-constant and piecewise-linear signals, the two operations that we can immediately offer are Min and Max, which can be efficiently computed over a sliding window using the algorithm of Lemire [9,15], and preserve the piecewiseconstant and piecewise-linear shapes. In discrete time or for piecewise-polynomial signals, we could use more aggregate operations, e.g., integration.

**"Eventually" and "Always".** Standard STL "eventually" and "always" operators can be expressed in the new language as follows:

$$\mathcal{F}\_{[a,b]}\,\varphi = \operatorname{On}\_{[a,b]} \operatorname{Max}\varphi \qquad\qquad \operatorname{G}\_{[a,b]}\,\varphi = \operatorname{On}\_{[a,b]} \operatorname{Min}\varphi$$

#### **3.1 Semantics of Until-Free Fragment**

The semantics of the until-free fragment is straightforward. The semantics of a ϕ-formula is a function <sup>ϕ</sup> : <sup>T</sup> <sup>→</sup> <sup>R</sup><sup>ε</sup> mapping real time to a dual value. We define it as: *<sup>x</sup>*(*t*) <sup>=</sup> *<sup>x</sup>*(*t*) <sup>ψ</sup>([*<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*, *<sup>t</sup>* <sup>+</sup> *<sup>b</sup>*])

$$\begin{aligned} \left[\boldsymbol{x}\right](t) &= \boldsymbol{x}(t) & \left[\mathrm{On}\_{[a,b]}\,\psi\right](t) &= \left[\psi\right](\left[t+a,t+b\right]) \\ \left[f(\varphi\_1 \ldots \varphi\_n)\right](t) &= f(\left[\varphi\_1\right](t) \ldots \left[\varphi\_n\right](t) \end{aligned} \tag{2}$$

We abuse the notation so that *x* is both a symbol referring to a component of an input signal and the corresponding real-valued function; similarly, *f* is both a function symbol and the corresponding function. <sup>ψ</sup> : (<sup>R</sup> ∪ −∞) × (<sup>R</sup><sup>ε</sup> ∪ ∞) → <sup>R</sup><sup>ε</sup>

The semantics of a ψ-formula is a function from an interval of time with real lower bound to a dual value. The upper bound of the interval can be dual-valued, which will be used by the "until" operation (see Sect. 3.3).

$$\begin{aligned} \text{[Min }\varphi\text{]} [a,b] = \min\_{\left[a,b\right]} [\varphi] \qquad & \text{[Max }\varphi\text{]} [a,b] = \max\_{\left[a,b\right]} [\varphi] \end{aligned} \tag{3}$$

The way we define min and max over an interval for a discontinuous piecewiselinear function relies on dual numbers, which we explain just below.

#### **3.2 Dual Numbers**

Dual numbers extend reals with a new element ε that has a property ε<sup>2</sup> = 0. A dual number can be written in a form *<sup>a</sup>* <sup>+</sup> *<sup>b</sup>*ε, where *<sup>a</sup>*, *<sup>b</sup>* <sup>∈</sup> <sup>R</sup>. We denote the set of dual numbers as Rε. Dual numbers were proposed by the English mathematician W. Clifford in 1873 and later applied in geometry by the German mathematician E. Study. One of modern applications of dual numbers and their extensions is in automatic differentiation [12]: one can exactly compute the value of the first derivative at a given point using the identity *<sup>f</sup>* (*<sup>x</sup>* <sup>+</sup> <sup>ε</sup>) <sup>=</sup> *<sup>f</sup>* (*x*)<sup>+</sup> *<sup>f</sup>* (*x*)ε. Intuitively, ε can be understood as an infinitesimal value, and *a* + *b*ε (for *b* - 0) is a point that is infinitely close to *a*. Polynomial functions can be extended to dual numbers, and via Taylor expansion, so can exponents, logarithms, and trigonometric functions. We work with piecewise-constant and piecewise-linear functions with real switching points, and we only make use of basic arithmetic. For example, if on the interval (*b*1, *<sup>b</sup>*2) the signal *<sup>x</sup>* is defined as *<sup>x</sup>*(*t*) <sup>=</sup> *<sup>a</sup>*1*<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*0, then *<sup>x</sup>*(*b*<sup>1</sup> <sup>+</sup> <sup>ε</sup>) <sup>=</sup> *<sup>a</sup>*1*b*<sup>1</sup> <sup>+</sup> *<sup>a</sup>*<sup>0</sup> <sup>+</sup> *<sup>a</sup>*1<sup>ε</sup> and *<sup>x</sup>*(*b*<sup>2</sup> <sup>−</sup> <sup>ε</sup>) <sup>=</sup> *<sup>a</sup>*1*b*<sup>2</sup> <sup>+</sup> *<sup>a</sup>*<sup>0</sup> <sup>−</sup> *<sup>a</sup>*1ε.

**Fig. 3.** Signals *<sup>x</sup>* and y for Example 8. **Fig. 4.** Signals *<sup>x</sup>* and y for Examples <sup>5</sup> and 6.

Our primary use of a dual number is to represent a time point strictly after an event (a switching point, a threshold crossing, etc.) but before any other event can happen, i.e., we use *<sup>t</sup>* <sup>+</sup> <sup>ε</sup> to represent the time point that happens right after *<sup>t</sup>* . The coefficient 1 at ε denotes that time advances with the rate of 1 (although another consistently used coefficient works as well). Consequently, we also allow an output signal to produce a dual value, denoting a value that is attained at this dual time point. On the other hand, we require that signals are defined over real time, switching points of piecewise signals are reals, and time constants in formulas are reals. That is, dual-valued time is only used internally by the temporal operators and cannot be directly observed.

**Minimum and Maximum of a Discontinuous Function.** We also use dualvalued time to define the result of Min and Max for a discontinuous piecewiselinear function. The standard way to compute minimum and maximum of a continuous piecewise-linear function on a closed interval is based on the fact that they are attained at the endpoints of the interval or at the endpoints of the segments on which the function is defined. Using dual numbers, we extend it to discontinuous functions: if for *<sup>t</sup>* ∈ (*<sup>b</sup>*1, *<sup>b</sup>*2), *<sup>x</sup>*(*t*) <sup>=</sup> *<sup>a</sup>*1*<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*<sup>2</sup> then we consider time points *<sup>b</sup>*<sup>1</sup> <sup>+</sup> <sup>ε</sup> and *<sup>b</sup>*<sup>2</sup> <sup>−</sup> <sup>ε</sup> as the candidates for reaching the minimum or maximum. Let us demonstrate this with an example.

**Example 5.** Consider the signal *<sup>x</sup>* defined as: "*x*(*t*) <sup>=</sup> <sup>−</sup>0.5*<sup>t</sup>* <sup>+</sup> <sup>1</sup>.5 if *<sup>t</sup>* <sup>∈</sup> [<sup>0</sup>, <sup>1</sup>); *<sup>x</sup>*(*t*) <sup>=</sup> <sup>0</sup>.5*<sup>t</sup>* <sup>+</sup> 1 if *<sup>t</sup>* <sup>≥</sup> 1", as shown in Fig. 4. Let us find the minimum of *<sup>x</sup>* on the interval [0, <sup>2</sup>+ε]. By our definition, min<sup>t</sup> ∈[0,2+ε] *<sup>x</sup>*(*t*) <sup>=</sup> min{*x*(0), *<sup>x</sup>*(1<sup>−</sup> <sup>ε</sup>), *<sup>x</sup>*(1), *<sup>x</sup>*(2+ε)} <sup>=</sup> *<sup>x</sup>*(1−ε) <sup>=</sup> <sup>1</sup>+0.5ε. This result should be understood as follows: *<sup>x</sup>*(*t*) approaches the value of 1 from the above with derivative <sup>−</sup>0.5, but never reaches it.

**Example 6.** Our definition of minimum and maximum allows to correctly compare values of piecewise-linear functions around their discontinuity points. In Example 5, *x* never reaches the value of its lower bound, and our definition of minimum produces a dual number that reflects this fact and also specifies the rate at which *x* approaches its lower bound. This information would be lost if we computed the infimum of *x*. Again consider the signals in Fig. 4, with *x* defined as before, and "y(*t*) <sup>=</sup> *<sup>t</sup>*, if *<sup>t</sup>* ∈ [0, <sup>1</sup>), y(*t*) <sup>=</sup> <sup>−</sup>0.5*<sup>t</sup>* <sup>+</sup> <sup>1</sup>, if *<sup>t</sup>* <sup>≥</sup> 1". Let us evaluate at time *<sup>t</sup>* <sup>=</sup> 0 the formula On[0,2] Min *<sup>x</sup>* <sup>&</sup>gt; On[0,2] Max <sup>y</sup>, which denotes the property <sup>∀</sup>*t*, *<sup>t</sup>* ∈ [0, <sup>2</sup>]. *<sup>x</sup>*(*t*) <sup>&</sup>gt; <sup>y</sup>(*<sup>t</sup>* ). From the previous example, we have that -On[0,2] Min *<sup>x</sup>*(0) <sup>=</sup> <sup>1</sup> <sup>+</sup> <sup>0</sup>.5ε. By a similar argument, -On[0,2] Max <sup>y</sup>(0) <sup>=</sup> y(<sup>1</sup> <sup>−</sup> <sup>ε</sup>) <sup>=</sup> <sup>1</sup> <sup>−</sup> <sup>ε</sup>, which means that y approaches 1 from below with the rate of 1. Since, 1 <sup>+</sup> <sup>0</sup>.5ε > <sup>1</sup> <sup>−</sup> <sup>ε</sup>, our property holds at time 0, as expected.

We want to emphasize that while an output signal can take a dual value, its domain is considered to be a subset of reals. The semantics of temporal operators are allowed to internally use dual-valued time points, but has to produce an output signal that is defined over real time. This ensures that a piecewise signal always has real-valued switching points and that no event can happen at a dualvalued time point.

**Example 7.** Consider a formula <sup>ϕ</sup> <sup>=</sup> <sup>F</sup>[0,2](*<sup>x</sup>* <sup>=</sup> On(− inf,inf) Min *<sup>x</sup>*), where *<sup>x</sup>* is as in Fig. 4. The meaning of ϕ is that within 2 time units *x* reaches its global minimum. In our semantics, this formula does not hold at time 0. By our definition, the global minimum of *x* is 1 + 0.5ε, so the semantics of the formula at time 0 is equivalent to:

$$\begin{aligned} \text{to:}\\ \|\varphi\|(0) &= \|\mathcal{F}\_{[0,2]}(\mathbf{x} = 1 + 0.5\varepsilon)\|(0) \\ &= \text{if } \exists t \in \mathcal{T}. \ t \in [0,2] \land \mathbf{x}(t) = 1 + 0.5\varepsilon \text{ then } 1 \text{ else } 0 \end{aligned}$$

where <sup>T</sup> <sup>=</sup> [0, <sup>|</sup>w|] ⊆ <sup>R</sup>. There is no real value of time, where *<sup>x</sup>*(*t*) yields a dual value, so the formula does not hold.

#### **3.3 Semantics of Until**

The On-operator allowed us to compute minima and maxima over a sliding window of fixed width. In this section, we introduce a new version of "until" operator that allows the window to have variable width that depends on the output signal of some formula.

**Reinterpreting the Classical Until as "Find First".** Let us explain how we extend the "until" operator to work in the new setting. There already exists real-valued robust semantics of "until", but we do not believe it to be a good specification primitive. Instead, re-state standard the Boolean semantics and based on the re-stated version introduce the new real-(actually, dual-)valued semantics. Let us recall a possible semantics of untimed until in STL. Informally, "until" computes a conjunction of the values of the first operand over an interval that is not fixed, but defined by the second operand. Formally, ≥ ) ∧ ]

$$\text{but defined by the second operad. Formula}$$

$$\left[p \text{ U}^{\text{STL}} \neq q\right](t) = \exists t' \ge t. \; q(t') \land \forall s \in [t, t']. \; p(s)$$

To denote the STL version of "until" we write it with the superscript: USTL, to distinguish from the new version that we define for our language. The version of "until" that we use in this paper is non-strict in the sense of [17]; it requites that *<sup>p</sup>* holds both at *<sup>t</sup>* and *<sup>t</sup>* .

Efficient monitoring of STL "until" relies on instantiating the existential quantifier. The monitor scans the signal backwards and instantiates *<sup>t</sup>* based on the earliest time point where *q* is true. The monitor needs to consider three cases shown in Figs. 5, 6 and 7.

**Fig. 5.** Case 1: *q* is never true in the future.

**Fig. 6.** Case 2: *q* there exists the earliest time point, where *q* becomes true.


Below is the equivalent semantics of STL until that resolves the existential quantifier: ])

$$\begin{aligned} & \text{either:} \\ & \begin{cases} \text{ $p$   $U$ }^{\text{STL}} \; q \end{cases} \vert \text{s} = \begin{cases} \forall s \in [t, t']. \; p(s), \text{ if there exists the smallest  $t' \ge t$ ,  $s \text{-t}$ . } q(t')\\ \forall s \in [t, t' + \varepsilon]. \; p(s), \text{ where } t' = \inf\{t' | t' \ge t \wedge q(t')\},\\ & \text{if } \exists t' \ge t. \; q(t'), \text{ but there is no smallest  $t'$ , }\\ \text{false}, \text{ otherwise} \end{cases} \end{aligned} $$

Then, a monitor evaluates the universal quantifier via a finite conjunction, since in practice the signal *p* has finite variability, i.e. every interval is intersected by a finite number of constant segments.

**Example 8.** Let us consider two linear input signals: *<sup>x</sup>*(*t*) <sup>=</sup> *<sup>t</sup>* and y(*t*) <sup>=</sup> <sup>2</sup>*<sup>t</sup>* <sup>−</sup> <sup>1</sup> (see Fig. 3), and let us evaluate the formula (y <sup>≤</sup> *<sup>x</sup>*) <sup>U</sup>STL (*<sup>x</sup>* <sup>&</sup>gt; <sup>1</sup>) at time 0 using non-strict "until" semantics. We define the earliest time point where *x* > 1 becomes true to be 1 <sup>+</sup> <sup>ε</sup>, thus we need to evaluate the expression <sup>∀</sup>*<sup>t</sup>* ∈ [0, <sup>1</sup> <sup>+</sup> <sup>ε</sup>]. y(*t*) ≤ *<sup>x</sup>*(*t*). At time 1 <sup>+</sup> <sup>ε</sup>, we get y(<sup>1</sup> <sup>+</sup> <sup>ε</sup>) <sup>=</sup> <sup>1</sup> <sup>+</sup> <sup>2</sup>ε > <sup>1</sup> <sup>+</sup> <sup>ε</sup> <sup>=</sup> *<sup>x</sup>*(<sup>1</sup> <sup>+</sup> <sup>ε</sup>), thus the "until" formula does not hold. Informally, we can interpret the result as follows: when *<sup>x</sup>* becomes greater than 1, y becomes greater than *<sup>x</sup>*, while non-strict "until" requires that there exists a point, where both its left- and right-hand operands hold at the same time.

**New Until as "Find First".** At this point, extending "until" to produce a dual value is straightforward. With every time point, "until" possibly associates an interval, and we can compute an arbitrary aggregate function over it, instead of just conjunction. In fact, we introduce two flavors of "until". The first version: ψ U<sup>d</sup> [<sup>a</sup>,b] <sup>ϕ</sup> – works as follows. For every time point *<sup>t</sup>*, we either associate an interval ending when ϕ becomes non-zero (i.e., starts holding); or we report that no suitable end point was found. When such interval exists, we evaluate ψ on it. When the interval does not exist, we produce *d*. Formally, ])

$$\begin{aligned} & \text{When the interval does not exist, we produce } d. \text{ Formula,} \\ & \left\| \boldsymbol{\upmu} \mathbf{U}\_{[a,b]}^{d} \boldsymbol{\upmu} \right\| (t) = \begin{cases} \left\| \boldsymbol{\upmu} \right\| [t, t'], \text{ if } \exists \text{ the smallest } t' \in [t + a, t + b], \text{ s.t. } [\boldsymbol{\upmu}] (t') \neq 0 \\\ [\boldsymbol{\upmu}] [t, t' + \varepsilon], \text{ where } t' = \inf \{ t' | t' \in [t + a, t + b] \land [\boldsymbol{\upmu}] (t') \}, \\\ & \quad \text{if } \exists t' \in [t + a, t + b]. \ [\boldsymbol{\upmu}] (t') \neq 0, \text{ but there is no smallest } t' \in [t + a, t + b] \\\ d, \text{ otherwise} \end{cases} \end{aligned}$$

The second version: <sup>ϕ</sup><sup>1</sup> <sup>↓</sup> <sup>U</sup><sup>d</sup> a,b] ϕ<sup>2</sup> does not perform aggregation, but evaluates ϕ<sup>1</sup> at the time point where ϕ<sup>2</sup> becomes non-zero, or produces *d* if such time point does not exist:

)

[

)

$$\begin{aligned} \left[\varphi1\downarrow\operatorname{U}\_{[a,b]}^{d}\varphi2\right](t) = \begin{cases} \left[\varphi1\right](t'), \text{ if } \exists \text{ the smallest } t' \in [t+a, t+b], \text{ s.t. } \left[\varphi2\right](t') \neq 0\\ \left[\varphi1\right](t'+\kappa), \text{ where } t' = \inf\{t'|t' \in [t+a, t+b] \land \left[\varphi2\right](t')\},\\ \text{ if } \exists t' \in [t+a, t+b]. \left[\varphi2\right](t') \neq 0, \text{ but there is no smallest } t' \text{ for a similar way, we could define past versions } \text{``until''}, \text{ where the interval } [a,b] \text{ is} \end{cases} \end{aligned}$$
 
$$\text{In a similar way, we could define past versions \text{``until''}, where the interval } [a,b]$$

refers to the past; we do not discuss them here due to space constraints.

**STL Until.** The standard STL "until" can be expressed in the new language as follows: [<sup>a</sup>,b] <sup>ϕ</sup><sup>2</sup> <sup>=</sup> (Min <sup>ϕ</sup>1) <sup>U</sup><sup>0</sup> [

$$
\varphi\_1 \operatorname{U}\_{[a,b]}^{\operatorname{STL}} \varphi\_2 = (\operatorname{Min} \varphi\_1) \operatorname{U}\_{[a,b]}^0 \varphi\_2
$$

**Lookup.** Using "until", we can express the "lookup" operator that queries the value of a signal at a point in the future, or returns some default value if the point does not exist. <sup>a</sup> <sup>ϕ</sup> <sup>=</sup> <sup>ϕ</sup><sup>↓</sup> <sup>U</sup><sup>d</sup> [

$$\mathbf{D}\_a^d \,\varphi = \varphi \downarrow \mathbf{U}\_{[a,a]}^d \mathbf{1}$$

**Example 9 (Spike).** The ST-Lib library [14] uses the following formula to define a start point of a spike: *<sup>x</sup>* <sup>&</sup>gt; *<sup>m</sup>* <sup>∧</sup> <sup>F</sup>[0,d](*<sup>x</sup>* <sup>&</sup>lt; <sup>−</sup>*m*), where *<sup>x</sup>* is the approximation of the right derivative *<sup>x</sup>* (*<sup>t</sup>*) <sup>=</sup> (*x*(*<sup>t</sup>* <sup>+</sup> <sup>δ</sup>) − *<sup>x</sup>*(*t*))/δ, *<sup>m</sup>* is the magnitude of the spike, and *d* is the width. Using the lookup operator, we can include the definition of *<sup>x</sup>* in the property itself: (

$$\text{the property itself:}\\ \text{(D}^{\text{y}}\_{\delta}\text{x}-\text{x})/\delta \ge m \land \text{F}\_{[0,d]}((\text{D}^{\text{y}}\_{\delta}\text{x}-\text{x})/\delta \le -m);$$

where y gives the value of the signal outside of its original domain.

**Fig. 8.** Before time 2, an event *p* is followed by an event *q*.

**Fig. 9.** A sequence of spikes and a Boolean signal marking the detected start times of spikes. (Color figure online)

**Example 10 (Spike of Given Width and Height).** Our language offers several alternative ways to define a spike. We can define a (start point of a) spike by composing two ramps: an increasing one, where the signal *x* increases by at least *<sup>m</sup>* withing w time units, and a decreasing one, where *<sup>x</sup>* decreases by at least *<sup>m</sup>* within w time units; the two ramps should be at most w units apart. The parameter w is the half-width of the spike. (

$$\begin{aligned} \text{ter } w \text{ is the half-width of the spike.}\\ (\text{On}\_{[0,w]} \text{ Max} \ge x + m) \land \text{F}\_{[0,w]}(\text{On}\_{[0,w]} \text{ Min} \ge x - m) \end{aligned}$$

Figure 9 shows an example of a series of spikes (blue) and a Boolean signal (red) that marks the detected start times of spikes.

**Example 11 (TPTL-like Assertion).** The second form of "until" allows to reason explicitly about time points and durations, somewhat similarly to TPTL. Consider the property "within 2 time units, we should observe an event *p* followed by an event *q*" (Fig. 8 shows an example of a satisfying signal). With some case analysis, this property can be expressed in MTL [5], but probably the best way to express it is offered by TPTL, that allows to assert "*c*. <sup>F</sup>(*p*∧F(*q*∧*<sup>c</sup>* <sup>≤</sup> <sup>2</sup>))", meaning "reset a clock *<sup>c</sup>*, eventually, we should observe *<sup>p</sup>* and from that point, eventually we should observe *q*, while the clock value will be at most 2". To express the property in our language, we introduce three auxiliary signals: *<sup>T</sup>*(*t*) <sup>=</sup> *<sup>t</sup>* (which we use in some other examples as well), *pdelay* <sup>=</sup> (*<sup>T</sup>* <sup>↓</sup> <sup>U</sup>∞*p*) − *<sup>T</sup>*, which denotes the duration until the next occurrence of *p* and similarly *qdelay* = (*<sup>T</sup>* <sup>↓</sup> <sup>U</sup>∞*q*) − *<sup>T</sup>*, the duration until the next occurrence of *<sup>q</sup>*. Then, the property can be expressed as: *pdelay* <sup>+</sup> (*qdelay* <sup>↓</sup> <sup>U</sup>∞*p*) ≤ 2.

#### **4 Monitoring**

Similarly to other works on STL monitoring (e.g., [9]), we implement the algorithms for a subset of the language, and support the remaining operators via rewriting rules.

**Rewriting of Until.** Similarly to STL, the timed "until" operator in our language can be expressed in terms of "eventually" (which is expressed using On), "lookup", and untimed "until". (

[

$$\begin{aligned} &\text{3, "lokup", and untimmed "until".}\\ &\text{(Min }\varphi\_1\text{) U}^d\_{[a,b]}\,\varphi\_2 = \text{ if }\neg\mathcal{F}\_{[a,b]}\,\varphi\_2 \text{ then }d\text{ else }\operatorname{On}\_{[0,a]}\operatorname{Min}(\operatorname{Min}(\operatorname{Min}\varphi\_1)\operatorname{U}\varphi\_2)\\ &\text{(Max }\varphi\_1\text{) U}^d\_{[a,b]}\,\varphi\_2 = \text{ if }\neg\mathcal{F}\_{[a,b]}\,\varphi\_2 \text{ then }d\text{ else }\operatorname{On}\_{[0,a]}\operatorname{Max}(\operatorname{Max}(\operatorname{Max}\varphi\_1)\operatorname{U}\varphi\_2)\\ &\varphi\_1\downarrow\operatorname{U}^d\_{[a,b]}\varphi\_2 = \text{ if }\neg\mathcal{F}\_{[a,b]}\,\varphi\_2 \text{ then }d\text{ else }\operatorname{D}\_a(\varphi\_1\downarrow\operatorname{U}\varphi\_2)\end{aligned}$$

Let us prove that the first equivalence is true, and for the other two the proof idea is similar. Let *<sup>t</sup>* be the time point where we evaluate (Min <sup>ϕ</sup>1)U<sup>d</sup> [a,b] ϕ<sup>2</sup> and its rewriting. If there is no time point *<sup>s</sup>* ∈ [*<sup>t</sup>*+*a*, *<sup>t</sup>*+*b*] where <sup>ϕ</sup><sup>2</sup> holds, both the original formula and its rewriting evaluate to *d*. Otherwise, let *s* be the earliest time point in [*<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*, *<sup>t</sup>* <sup>+</sup> *<sup>b</sup>*], where <sup>ϕ</sup><sup>2</sup> holds, which can be a real or dual value, as explained in Sect. 3.3. Then the original formula evaluates to min{ϕ1(*t* ) | *<sup>t</sup>* ∈ [*t*, *<sup>s</sup>*]}. The rewritten formula at *<sup>t</sup>* evaluates to min{-(Min <sup>ϕ</sup>1) <sup>U</sup> <sup>ϕ</sup>2 <sup>|</sup> *<sup>t</sup>* ∈ [*t*, *<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*]}. Notice that for every *<sup>t</sup>* there is a time point in the future, which we denote g(*t* ) where <sup>ϕ</sup><sup>2</sup> holds, which is at most *<sup>s</sup>*, and for *<sup>t</sup>* <sup>=</sup> *<sup>t</sup>* <sup>+</sup> *<sup>a</sup>* it is exactly *<sup>s</sup>*. That is, the rewritten formula evaluates to min{min{<sup>ϕ</sup>1(*t*) | *<sup>t</sup>* ∈ [*<sup>t</sup>* , g(*t* )]} | *t* ∈ [*<sup>t</sup>*, *<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*]} <sup>=</sup> min{<sup>ϕ</sup>1(*t*) | *<sup>t</sup>* <sup>∈</sup> {[*<sup>t</sup>* , g(*t* )] | *<sup>t</sup>* ∈ [*t*, *<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*]}}. Notice that since g(*t* )∈[*t* , *<sup>s</sup>*] and g(*<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*) <sup>=</sup> *<sup>s</sup>*, then {[*<sup>t</sup>* , g(*t* )] | *<sup>t</sup>* ∈ [*t*, *<sup>t</sup>* <sup>+</sup> *<sup>a</sup>*]} <sup>=</sup> [*t*, *<sup>s</sup>*], and thus the rewritten formula evaluates to the same value as the original one.

**Referring to Both Future and Past.** In the syntax, we allow the On[a,b] operator to refer to both future and past, i.e., we allow the case when *a* < 0 and *<sup>b</sup>* <sup>&</sup>gt; 0. Algorithms for Min/Max over a running window typically cannot work with this situation directly, and we need to apply the following rewriting: if *a* < 0 and *b* > 0,

$$\begin{aligned} &> 0, \\ &\text{On}\_{[a,b]} \operatorname{Min} \varphi = \min \{ \operatorname{On}\_{[a,0]} \operatorname{Min} \varphi, \ \operatorname{On}\_{[0,b]} \operatorname{Min} \varphi \} \\ &\text{On}\_{[a,b]} \operatorname{Max} \varphi = \max \{ \operatorname{On}\_{[a,0]} \operatorname{Max} \varphi, \ \operatorname{On}\_{[0,b]} \operatorname{Max} \varphi \} \end{aligned}$$

**Language of the Monitor.** The following subset of the language is equally expressive as the full language presented in (1). We implement the monitoring algorithms for this language, and the full syntax of (1) we support via rewriting.

ns to'tins language, and the full system of (1) we support via  $\varphi$ .\

 $\varphi \Vdash c \mid \mathbf{x} \mid f(\varphi\_1 \cdots \varphi\_n) \mid \operatorname{On}\_{[a,b]} \psi \mid \psi \operatorname{ U}^d \varphi \mid \varphi\_1 \downarrow \operatorname{U}^d \varphi\_2 \mid \operatorname{D}\_a^d \varphi$ .\
 $\varphi \Vdash = \operatorname{Min} \varphi \mid \operatorname{Max} \varphi$ 

where either *<sup>a</sup>* <sup>≥</sup> 0 or *<sup>b</sup>* <sup>≤</sup> 0, i.e., the interval [*a*, *<sup>b</sup>*] cannot refer to both future and past.

All operators in the language of the monitor admit efficient offline monitoring. Minimum and maximum over a sliding window required by the On-operator can be computed using a variation of Lemire's algorithm [9,15]; "lookup" operator D shifts its input signal by a constant distance; and for untimed "until" we can scan the input signal backwards and perform a special case of running minimum or maximum.

#### **4.1 Monitoring Algorithms**

In this section, we briefly describe monitoring algorithms for piecewise-constant signals. **Representation of Signals.** We represent a piecewise-constant function <sup>T</sup> <sup>→</sup>

<sup>R</sup> or <sup>T</sup> <sup>→</sup> <sup>R</sup><sup>ε</sup> as a sequence of segments: *<sup>s</sup>*0, *<sup>s</sup>*1,..., *<sup>s</sup>*m−1, where every segment *<sup>s</sup>*<sup>i</sup> <sup>=</sup> *<sup>J</sup>*<sup>i</sup> → <sup>v</sup><sup>i</sup> maps an interval *<sup>J</sup>*<sup>i</sup> to a real or dual value <sup>v</sup>i. The intervals *<sup>J</sup>*<sup>i</sup> form a partition the domain of the signal and are ordered in ascending time order, i.e., sup *<sup>J</sup>*<sup>i</sup> <sup>=</sup> inf *<sup>J</sup>*i+<sup>1</sup> and *<sup>J</sup>*<sup>i</sup> <sup>∩</sup> *<sup>J</sup>*i+<sup>1</sup> <sup>=</sup> . The domain of the signal corresponding to the sequence *<sup>u</sup>* <sup>=</sup> *<sup>J</sup>*<sup>0</sup> → <sup>v</sup>o,..., *<sup>J</sup>*m−<sup>1</sup> → <sup>v</sup>m−1 is denoted by *dom*(*u*) <sup>=</sup> *<sup>J</sup>*<sup>0</sup> <sup>∪</sup> ... <sup>∪</sup> *<sup>J</sup>*m−1. For example, if the function *<sup>x</sup>*(*t*) is defined as *<sup>x</sup>*(*t*) <sup>=</sup> 0, if *<sup>t</sup>* ∈ [0, <sup>1</sup>), and *<sup>x</sup>*(*t*) <sup>=</sup> 1, if *<sup>t</sup>* ∈ [1, <sup>2</sup>], then *<sup>x</sup>*(*t*) is represented by the sequence *<sup>u</sup>*<sup>x</sup> <sup>=</sup> [0, <sup>1</sup>) → <sup>0</sup>, [1, <sup>2</sup>] → <sup>1</sup>, and *dom*(*u*x) <sup>=</sup> [0, <sup>2</sup>]. Empty brackets denote an empty sequence that does not represent a valid

signal, but can be used by algorithms as an intermediate value. We manipulate the sequences with two main operations. The function *append* adds a segment to the end of a sequence: *append*( *s*0,..., *<sup>s</sup>*m−1, *<sup>s</sup>* ) <sup>=</sup> *<sup>s</sup>*0,..., *<sup>s</sup>*m−1, *<sup>s</sup>* . The function *prepend* adds a segment to the start of a sequence: *prepend*( *s*0,..., *<sup>s</sup>*m−1, *<sup>s</sup>* ) = *s* , *<sup>s</sup>*0,..., *<sup>s</sup>*m−1. This may produce a sequence where the first segment does not start time at time 0. While such a sequence does not represent a valid signal, it can be used by the algorithms as an intermediate value. The function *removeLast* removes the last segment of a sequence, assuming it was non-empty: *removeLast*( *s*0,..., *<sup>s</sup>*m−1) <sup>=</sup> *<sup>s</sup>*0,..., *<sup>s</sup>*m−2.

An output signal of a formula is scalar-valued and is represented by one such sequence. An input signal usually has multiple components, i.e., it is a function <sup>T</sup> <sup>→</sup> <sup>R</sup>n, and is represented by a set of *<sup>n</sup>* sequences.

**On-Formulas.** For On[a,b] Min <sup>ϕ</sup> and On[a,b] Max <sup>ϕ</sup>, a monitor needs to compute the minimum or maximum of the output signal of ϕ over the sliding window. The corresponding algorithm was developed for discrete time by Lemire [15] and later adapted for continuous time [9].

**Lookup-Formulas.** Computing the output signal for D<sup>d</sup> a ϕ is straightforward. We need to shift every segment of *u*<sup>ϕ</sup> (the representation of the output signal of ϕ) to the left by *a* truncating at 0 and append a padding segment with the value of *d*.

**Until-Formulas.** Informally, monitoring the "until"-formulas, Min ϕ<sup>1</sup> U<sup>d</sup> ϕ2, Max <sup>ϕ</sup><sup>1</sup> <sup>U</sup><sup>d</sup> <sup>ϕ</sup>2, and <sup>ϕ</sup><sup>1</sup> <sup>↓</sup> <sup>U</sup>dϕ2, works as follows. The monitor scans the output signals of ϕ<sup>1</sup> and ϕ<sup>2</sup> backwards. While ϕ<sup>2</sup> evaluates to a non-zero value, the monitor outputs the value of ϕ1. When ϕ<sup>2</sup> evaluates to 0, the monitor outputs either the default value (if the monitor did not yet encounter a non-zero value of ϕ2), or the running minimum or maximum of ϕ1, or the value that ϕ<sup>1</sup> had at the last time point where ϕ<sup>2</sup> was non-zero.

The function *until* and *untilAnd* in Fig. 10 implement this idea. The inputs to the function *until* are: sequences *u*<sup>1</sup> and *u*<sup>2</sup> representing the output signals of <sup>ϕ</sup><sup>1</sup> and <sup>ϕ</sup><sup>2</sup> (with *dom*(*u*1) <sup>=</sup> *dom*(*u*2)), default value *<sup>d</sup>*, and the function *<sup>f</sup>* used for aggregation; it can be min, max, or the special function <sup>λ</sup>*x*, y. *<sup>x</sup>* which

**Fig. 10.** Algorithm for monitoring "until"-formulas.

returns the value of its first argument and which we use to monitor the formula <sup>ϕ</sup><sup>1</sup> <sup>↓</sup> <sup>U</sup>dϕ2. The function *until* scans the input sequences backwards and iterates over intervals where both input signals maintain a constant value (*J*). Each such interval is passed to the function *untilAdd*, which updates the state of the algorithm (v , *s*) and constructs the output signal (*u*r ).

### **5 Implementation and Experiments**

We implemented the monitoring algorithm in a prototype tool that is available at https://gitlab.com/abakhirkin/StlEval. The tool has a number of limitations, notably it can only use piecewise-constant interpolation (so we cannot evaluate examples that use the auxiliary signal *<sup>T</sup>*(*t*) <sup>=</sup> *<sup>t</sup>*) and does not support pasttime operators. It is written in C++ and uses double-precision floating point numbers for time points and signal values. We evaluate the tool using a number of synthetic signals and a number of properties based on the ones described earlier in the paper.

**Signals.** We use the following signals discretized with time step 1.


– *<sup>x</sup>*spike – series of spikes; a single spike is defined for *<sup>t</sup>* ∈ [0, <sup>125</sup>) as: *<sup>x</sup>*spike(*t*) <sup>=</sup> *e* (t−50) 2 <sup>2</sup>·<sup>102</sup> , and after that the pattern repeats; see blue curve in Fig. 9.

**Properties.** We use the following properties:


Some properties are expressed in our language using On- and "until"-operators, and some are STL properties. This allows us to see how much time it takes to monitor a more complicated property in our language (e.g., ϕstab, stabilization around an unknown value) compared to a similar but more simple STL property (e.g., <sup>ϕ</sup>stab−0, stabilization around a known value). In our experiments we see a constant factor between 2 and 5.

Table 1 shows the evaluation results. A row gives a formula and a signal shape; a column gives the number of samples in the input signal, and a table cell gives two time figures in seconds: the monitoring time excluding the time required to read the input data, and the total runtime of an executable. We note that for our tool, the total runtime is dominated by the time required to read the input signal from a text file. For the three STL properties we include the time it took AMT 2.0 (a monitoring tool written in Java [18]) and Breach (a Matlab toolbox partially written in C++ [8]; Breach does not have a standalone executable, so the we leave the corresponding columns empty) to evaluate the formula. This way we show that our implementation of STL monitoring has good enough performance to be used as a baseline when evaluating the cost of the added expressiveness in the new language. Time figures were obtained using a PC with a Core i3-2120 CPU and 8 GB RAM running 64-bit Debian 8.


**Table 1.** Monitoring time for different formulas and signals.

#### **6 Conclusion and Future Work**

We describe a new specification language that extends STL with the ability to produce and manipulate real-valued output signals (while in STL, every formula has a Boolean output signal). Properties in the new language are specified in terms of minima and maxima over a sliding window, which can have fixed width, when using a generalization of F- and G-operators, or variable width, when using a new version "until". We show how the new language can express properties that motivated the creation of more expressive and harder to monitor logics. Offline monitoring for the new language is almost as efficient as STL monitoring; the complexity is linear in the length of the input signal and does not depend on the constants appearing in the formula.

There are multiple directions for future work; perhaps more interesting one is adding integration over a sliding window (in addition to minimum and maximum). This is already allowed by some formalisms [7], and when added to our language will allow to assert that a signal approximates the behaviour of a system defined by a given differential equation (since we will be able to assert y(*t*) ≈ ∫ t <sup>0</sup> *<sup>x</sup>*(*t*)*dt*). Before making integration available, we wish to investigate how to better deal in a specification language with approximation errors. Finally, we wish to make our language usable in falsification, which means that for every formula with Boolean output signal we wish to be able to compute a real-valued robustness measure.

**Acknowledgements.** The authors thank T. Ferr´ere, D. Nickovic, E. Asarin for comments on the draft of this paper, and O. Lebeltel for providing a version of AMT for the experiments.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **VYPR2: A Framework for Runtime Verification of Python Web Services**

Joshua Heneage Dawes1,2(B) , Giles Reger<sup>1</sup>, Giovanni Franzoni<sup>2</sup>, Andreas Pfeiffer<sup>2</sup>, and Giacomo Govi<sup>3</sup>

> <sup>1</sup> University of Manchester, Manchester, UK <sup>2</sup> CERN, Geneva, Switzerland joshua.dawes@cern.ch <sup>3</sup> Fermi National Accelerator Laboratory, Batavia, IL, USA

**Abstract.** Runtime Verification (RV) is the process of checking whether a run of a system holds a given property. In order to perform such a check online, the algorithm used to monitor the property must induce minimal overhead. This paper focuses on two areas that have received little attention from the RV community: Python programs and web services. Our first contribution is the VyPR runtime verification tool for singlethreaded Python programs. The tool handles specifications in our, previously introduced, Control-Flow Temporal Logic (CFTL), which supports the specification of state and time constraints over runs of functions. VyPR minimally (in terms of reachability) instruments the input program with respect to a CFTL specification and then uses instrumentation information to optimise the monitoring algorithm. Our second contribution is the lifting of VyPR to the web service setting, resulting in the VyPR2 tool. We first describe the necessary modifications to the architecture of VyPR, and then describe our experience applying VyPR2 to a service that is critical to the physics reconstruction pipeline on the CMS Experiment at CERN.

### **1 Introduction**

Runtime Verification [1] is the process of checking whether a run of a system holds a given property (often written in a temporal logic). This can be checked while the system is running (*online*) or after it has run (*post-mortem* or *offline*). Often this is presented abstractly as checking an abstraction of behaviour, captured by a *trace*. This abstract setting often ignores the practicalities of instrumentation and deployment. This paper presents a tool for the runtime verification of Python-based web services that efficiently handles the instrumentation problem and integrates with the widely used web-framework Flask [2]. This work is carried out within the context of verifying web-services used at the CMS Experiment at CERN.

Despite the wealth of existing logics [3–9], in our work [10,11] performing verification of state and time constraints over Python-based web services on the CMS Experiment at CERN we have found that, in most cases, the existing logics operate at a high level of abstraction in relation to the program under scrutiny. This leads to (1) a less straightforward specification process for engineers, who have to think indirectly about their programs; and (2) difficulty writing specifications about behaviour inside functions themselves. These observations led us to develop Control-Flow Temporal Logic [10,11] (CFTL), a logic that has a tight-coupling with the control flow of the program under scrutiny (so operates at a lower level of abstraction which, in our experience, makes writing specifications with it easier for engineers) and is easy to use to specify state and time constraints over single runs of functions.

After the introduction of CFTL (Sect. 2), the first contribution of this paper is a description of the VyPR tool (Sect. 3), which verifies single-threaded Python programs with respect to CFTL specifications. It does this by (1) providing PyCFTL, the Python binding for CFTL, for writing specifications; (2) instrumenting the input program minimally with respect to reachability; and (3) using the resulting instrumentation information to make its online monitoring algorithm more efficient.

Since the development of VyPR as a prototype verification tool for CFTL, we have found that there are, to the best of our knowledge, no frameworks for fullyautomated instrumentation and verification of multiple functions in web services with respect to low-level properties. Therefore, the second contribution of this paper is the lifting of CFTL and VyPR to the web service setting in a tool we call VyPR2 (Sect. 4). We present a general infrastructure for the runtime verification of Python-based web services with respect to CFTL specifications. Moving from VyPR to VyPR2 presents a number of challenges, which we discuss in detail. For the moment, we focus on web services that use the Flask framework, a Python framework that allows one to write a web service by writing Python functions to serve as end-points. VyPR2 admits a simple specification process using PyCFTL, performs automatic and optimised instrumentation of the web service under scrutiny, and provides a separate verdict server for collection of verdicts obtained by monitoring CFTL specifications.

Our final contribution is a case study (Sect. 5) applying VyPR2 to the CMS Conditions Upload Service [12], a single-threaded Python-based web service used on the CMS Experiment at CERN. We find that our verification infrastructure induces minimal overhead on Conditions uploads, with experiments showing an overhead of approximately 4.7%. We also find unexpected violations of the specification, one of which has triggered investigations into a mechanism that was designed to be an optimisation but is in danger of adding unnecessary latency. Ultimately, VyPR2 has made analysis of the performance of a critical part of CMS' physics reconstruction pipeline much more straightforward.

#### **2 Control-Flow Temporal Logic (CFTL)**

Both of the tools presented in this paper make use of the CFTL specification language [10,11]. We briefly describe this language, focusing on the kinds of

$$\begin{array}{lcl}\phi & \coloneqq \forall q \in I\_S : \phi \mid \forall t \in I\_T : \phi \mid \phi \lor \phi \mid \neg \phi \mid true \mid \phi\_A\\ \phi\_A & \coloneqq S(x) = v \mid S(x) = S(x) \mid S(x) \in (n, m) \mid S(x) \in [n, m] \\ & \mid \text{duration}(T) \in (n, m) \mid \text{duration}(T) \in [n, m] \\\ I\_S & \coloneqq \mathsf{changes}(x) \mid \text{future}\_S(q, \mathsf{changes}(x)) \mid \text{future}\_S(t, \mathsf{changes}(x)) \\\ I\_T & \coloneqq \mathsf{calls}(f) \mid \text{future}\_T(q, \mathsf{calls}(f)) \mid \text{future}\_T(t, \mathsf{calls}(f)) \\\ S & \coloneqq q \mid \text{source}(T) \mid \mathsf{dest}(T) \mid \mathsf{next}\_S(S, \mathsf{changes}(x)) \mid \mathsf{next}\_S(T, \mathsf{chings}(x)) \\\ T & \coloneqq t \mid \text{incident}(S) \mid \mathsf{next}\_T(S, \mathsf{clals}(f)) \mid \mathsf{next}\_T(T, \mathsf{calls}(f)) \end{array}$$

**Fig. 1.** Syntax of CFTL.

properties it can capture. CFTL is a linear-time temporal logic whose formulas reason over two central types of objects: *states*, instantaneous *checkpoints* in a program's runtime; and *transitions*, the computation that must happen to move between states.

Consider the following property, taken from the case study in Sect. 5:

*Whenever* authenticated *is changed, if it is set to* True, *then all future calls to* execute*should take no more than 1 second.*

This can be expressed in CFTL as

$$\begin{array}{l} \forall q \in \mathsf{changes}(\text{authemticated}):\\ \forall t \in \mathsf{future}(q, \texttt{calls}(\text{execute})):\\ \qquad q(\text{authemticated}) = \text{True} \implies \texttt{duration}(t) \in [0, 1] \end{array} \tag{1}$$

This first quantifies over the states q in which the program variable authenticated is changed and then over the transitions t occurring after that state that correspond to a call of a program function called execute. Given this pair of q and t, the specification then states that if authenticated is mapped to True by q then the duration of the transition t is within the given range.

*Syntax.* Figure 1 gives the syntax of CFTL. CFTL specifications take prenex form consisting of a list of quantifiers followed by a quantifier-free part. The quantification domains are defined by Γ<sup>S</sup> (for states) and Γ<sup>T</sup> (for transitions). Terms produced by the S and T cases denote states and transitions respectively. We often drop the S and T subscripts from future and next when the meaning is clear from the context. The quantifier-free part of CFTL formulas is a boolean combination of *atoms* generated by φA. Let A(ϕ) be the set of atoms of a CFTL formula ϕ and, for α ∈ A(ϕ), let var(α) be the variable on which α is based. In the above example A(ϕ) = {q(authenticated) = True, duration(t) ∈ [0, 1]}, var(q(authenticated) = True) = q, and var(duration(t) ∈ [0, 1]) = t. A CFTL formula is well-formed if it does not contain any free variables (those not captured by a quantifier) and every nested quantifier depends on the previously quantified variable.

*Semantics.* The semantics of CFTL is defined over a *dynamic run* of the program. A dynamic run is a sequence of *states* τ = σ, t, where σ is a map (partial functions with finite domain) from program variables/functions to values and <sup>t</sup> <sup>∈</sup> <sup>R</sup><sup>≥</sup> is a timestamp. Transitions are then pairs τi, τ<sup>j</sup> for states <sup>τ</sup><sup>i</sup> and τ<sup>j</sup> . The *product quantification domain* over which a CFTL formula is evaluated is derived from the dynamic run using the quantifier list e.g. by extracting all states where some variable changes. Elements of the product quantification domain are maps from specification variables to concrete states/transitions and will be referred to as *concrete bindings*.

### **3 VYPR**

We now present VyPR, which can perform runtime verification on a single Python function with respect to some CFTL specification ϕ. Further details can be found in a paper [11] and technical report [10], and the tool is available online at http://cern.ch/vypr/.

*Tool Workflow.* To runtime verify a Python function we follow the following steps. Firstly the property is captured as a CFTL specification using a Python binding called PyCFTL. Given this specification, VyPR instruments the input program so that the monitoring algorithm receives data from any points in the program that could contribute to a verdict. Finally, the modified program will communicate with the monitor at runtime, which will process the observations to produce a verdict.

#### **3.1 Writing CFTL Specifications with PyCFTL**

The first step is to write a CFTL specification. Note that such a specification is specific to a particular function being verified as it refers directly to the symbols in that function. For specification we provide PyCFTL, a Python binding for CFTL. Figure 2 shows the PyCFTL specification for the CFTL specification in Eq. 1. A CFTL specification is defined in PyCFTL in two parts:

1. The first part is the quantification sequence. For example, the quantification ∀q ∈ changes(x) is given as Forall(q = changes('x')).

2. The second part, the argument to Check(), gives the property to be evaluated for each concrete binding in the quantification domain. This is done by specifying a *template* for the specification with a lambda expression (an anonymous function in Python) whose arguments match the variables in the quantification sequence.

#### **3.2 Instrumenting for CFTL**

VyPR instruments a Python program for a CFTL specification ϕ by building up the set Inst containing all points in the program that could contribute to the verdict of ϕ. VyPR works at the level of the *abstract syntax tree* (AST) of the program and the program points of interest are nodes in the AST. Once this set of nodes has been computed, the AST is modified to add instruments at each of these points.

During runtime monitoring the most expensive operation is usually the lookup of the relevant monitor state that needs to be modified. To make monitoring more efficient, our instrumentation algorithm computes Inst by computing a direct lookup structure that allows the monitoring algorithm to go directly to this state. This structure can be abstractly viewed as a tree, Hϕ, whose leaves are sets that form a partition of Inst and whose intermediate nodes contain the information required to identify the relevant monitoring state.

The first step in computing H<sup>ϕ</sup> is to construct the *Symbolic Control-Flow Graph* (SCFG) of the body of a (Python) function f.

**Definition 1.** *A symbolic control-flow graph (SCFG) is a directed graph* V,E,vs *where* V *is a finite set of symbolic states (maps from all program symbols, e.g. program variables/functions, to a status in* {*changed*, *unchanged*, *called*, *undefined*}*),* E ⊆ V ×V *is a finite set of edges, and* v<sup>s</sup> ∈ V *is the initial symbolic state.*

The SCFG of a function f is independent of any property ϕ being checked. Our construction of the SCFG of a program encodes information about state changes (by symbolic states) and reachability (by edges being generated for each state-changing instruction in code), making it an ideal structure from which to derive candidate points for state changes. The SCFG is used to find all symbolic states or edges that *could* generate concrete bindings in the product quantification domain of a formula. For example, if the CFTL specification is ∀q ∈ changes(x) : q(x) < 10, all symbolic states representing changes to x will be identified as having potential to generate concrete bindings. From this, we construct a set of *static* bindings, which are maps from specification variables to candidate symbolic states/edges in the SCFG. The key distinction between *concrete* and *static* bindings is that static bindings are computed from the SCFG before runtime, and can correspond to zero or more concrete bindings during runtime. We call the set of static bindings the *binding space* for ϕ with respect to the SCFG and denote it by B<sup>ϕ</sup> with the SCFG implicit. Elements β of B<sup>ϕ</sup> form the top level of the tree Hϕ.

**Data:** ϕ and the SCFG -V,E,vs of function f **Result:** Lookup tree H<sup>ϕ</sup> // Construct B<sup>ϕ</sup> B<sup>ϕ</sup> = {∅}; **foreach** *quantified variable* (x<sup>i</sup> ∈ predicate) *in* ϕ *in order* **do for** v ∈ V **do if** v *is a candidate for* predicate **then** B<sup>ϕ</sup> = {β ∪ [x<sup>i</sup> → v] | β ∈ B<sup>ϕ</sup> ∧ i > 1 → reaches(β(x<sup>i</sup>*−*<sup>1</sup>), v)}; **end end** // Construct H<sup>ϕ</sup> H<sup>ϕ</sup> = ∅; **for** β ∈ B<sup>ϕ</sup> *with index* i<sup>β</sup> **do for** *quantified variable* x<sup>i</sup> *in* ϕ *with index* i<sup>q</sup> **do foreach** α ∈ {α ∈ A(ϕ) | var(α) = xi} *with index* i<sup>α</sup> **do** Hϕiβ, iq, iα ← lift(α, β(xi)); **end end end**

**Algorithm 1:** VyPR's algorithm for construction of the tree Hϕ.

Once B<sup>ϕ</sup> is constructed, for each β ∈ Bϕ, VyPR lifts each α ∈ A(ϕ) (the atoms of ϕ) from the dynamic context to the SCFG in order to find the relevant symbolic states/edges around the symbolic state/edge β(var(α)). This process constructs the second and third levels of the tree Hϕ: the second level consisting of variables, and the third level of atoms in A(ϕ). The leaves on the fourth level of the tree H<sup>ϕ</sup> are then the subsets of Inst; sets of symbolic states or edges from the SCFG.

Whilst we can abstractly view H<sup>ϕ</sup> as a tree, in practice we represent it as a map from triples iB, i∀, iα to symbolic states/edges of the SCFG where iB, i<sup>∀</sup> and i<sup>α</sup> are indices into the binding space, quantifier list, and set of atoms respectively. An instrument placed in the input program for an atom α, using Hϕ, contains a triple to identify a subset of Inst and a value obs which is whatever code is required to obtain the value necessary to compute a truth value for α. For example, if the instrument is being placed to record the value of a program variable, obs is the name of the variable which, at runtime, is evaluated to give the value the variable holds. Such an instrument, which pushes its triple and evaluated obs value to a queue to be consumed by the monitoring thread, is placed by modifying the Abstract Syntax Tree (AST) of the program.

Our algorithm for construction of H<sup>ϕ</sup> is Algorithm 1. This makes use of a predicate reaches which checks whether one symbolic state is reachable from another in the SCFG; and a function lift(α, v) for α ∈ A(ϕ) and v ∈ V which gives the symbolic states reachable from v obtained by lifting α to the static context. With the tree H<sup>ϕ</sup> and binding space B<sup>ϕ</sup> defined, in the next section we present our monitoring approach.

#### **3.3 Monitoring for CFTL**

The modified version of the body of f resulting from instrumentation is run alongside VyPR's monitoring algorithm, which consumes data from instruments via a consumption queue populated by the main program thread. Monitoring is performed asynchronously. VyPR's monitoring algorithm involves instantiating a formula tree (an and-or tree) for each binding in the quantification domain of a formula. This algorithm uses the triple iB, i∀, iα and evaluated obs value given by each instrument to perform lookup (to find in which formula trees to update the truth value of a specific atom), decide if new formula trees should be instantiated and compute the truth value of the atom at index i<sup>α</sup> in A(ϕ).

Given a CFTL formula ∀q<sup>1</sup> ∈ Γ1,..., ∀q<sup>n</sup> ∈ Γ<sup>n</sup> : ψ(q1,...,qn), when monitoring one can interpret multiple quantification as single quantification over a product space Γ<sup>1</sup> ×··· × Γn. Such a space contains concrete bindings [q<sup>1</sup> → v1,...,q<sup>n</sup> → vn] for states or transitions vi. Each of these concrete bindings generated at runtime corresponds to a single static binding β ∈ Bϕ. Using this correspondence, we say that each concrete binding has a *supporting static binding* β ∈ Bϕ.

Given that monitoring is performed by instantiating a formula tree for each concrete binding in the product quantification domain, the speed of lookup of relevant formula trees is greatly increased by grouping them by the indices of supporting static bindings (determined by iB). Hence, to either update or instantiate formula trees, when information is observed from an instrument that helps to evaluate ψ at some concrete binding, the supporting static binding must be found, giving rise to the requirement for static information during monitoring. During monitoring, lookup of which set of formula trees to use is straightforward since the index i<sup>B</sup> is given by the instrument.

Once lookup has been performed, the result is a set of formula trees corresponding to the static binding index i<sup>B</sup> received from the instrument. From here, the index i<sup>α</sup> is used to determine the atom in A(ϕ) whose truth value (computed using the value given by obs) must be updated in each formula tree.

#### **3.4 Verdict Reports**

Once execution has finished, a verdict report is generated, which VyPR keeps in memory. Since each formula tree corresponds to a single concrete binding, verdicts share concrete bindings' correspondence with static bindings. Hence, verdicts can be grouped by the supporting static bindings. Given the binding space B<sup>ϕ</sup> computed during instrumentation, a verdict report V from a single run of a function can be seen as a partial function

$$\mathcal{V}: \mathcal{B}\_{\varphi} \to (\{\top, \bot\} \times \mathbb{R}\_{\geq})^\*,$$

sending a static binding β ∈ B<sup>ϕ</sup> to a sequence of pairs containing a verdict from { , ⊥} and a timestamp (the time at which the verdict was obtained). The map V sends static bindings to sequences of pairs, rather than single pairs, because single static bindings can support multiple concrete bindings, generating multiple verdicts. This is the case if, for example, the static binding is inside a loop that iterates more than once at runtime.

### **4 An Architecture for Web Service Verification**

We begin our description of the architecture of VyPR2, the extension of VyPR to web services, by isolating a number of requirements imposed by web service deployment environments, and production software environments in general, that must be met.

The environment at CERN inside which our verification infrastructure must function is similar to most production environments. It consists of machines for development and production, with each machine automatically pulling the relevant tags from a central repository once engineers have pushed their (locallytested) code. Based on this deployment architecture, and the architecture of web services, requirements for our Runtime Verification framework include:

*Centralised specifications over multiple functions with multiple properties.* It should be possible to verify each function in a web service with respect to multiple properties. Further, specifications for the whole web service should be written in a single file, to minimise intrusion into the web service's code.

*Making instrumentation data persistent.* Web services' code can be pulled from a repository onto a production server and, once launched, be restarted multiple times between successive deployments of different code versions. Therefore, instrumentation data must be persistent between processes.

*Persistent verdict data.* Similarly, verdict data must be persistent and, furthermore, engineers must be able to perform offline analysis of the verdicts reached by web services at runtime.

An architecture that meets these requirements is illustrated in Fig. 3, and described in the following sections. The resulting tool, VyPR2, will soon be publicly available from http://cern.ch/vypr.

#### **4.1 Specifying Multiple Function, Multiple Property Specifications**

For simplicity of use, we have opted to have engineers write their entire specification in a central configuration file, in the root directory of their web service. This is a file written in Python, specifying CFTL properties over the service using the PyCFTL library.

Part of such a configuration file, using the PyCFTL specification given in Fig. 2, is shown in Fig. 4: one must first give the fully-qualified name of the module in the service in standard Python *dot* notation and then, for each function, the list of properties built up using PyCFTL.

**Fig. 4.** A CFTL specification and its PyCFTL equivalent.

#### **4.2 Instrumentation**

Given a specification such as that in Fig. 4, VyPR's strategy must be extended to the multiple function, multiple property context. Multiple functions are dealt with by constructing the SCFG for each function found in the specification and performing instrumentation for each property.

Instrumentation for each property over the same function is performed sequentially: VyPR2 instruments using the AST of the input code, and so instrumentation for each property progressively modifies the AST.

We now describe the modifications required to the actual instruments. In VyPR's simplified setting, instruments need only send the iB, i∀, iα triple along with the obs value relevant to the atom for which the instrument was placed. The multiple function, multiple property setting yields several problems that are solved by modifying existing instruments and adding a new kind.

In our architecture, monitoring is performed by a single thread, which means that this thread must have a way to distinguish between instruments received from different functions. We accomplish this by adding the name of the function to all instruments added to code. By adding the name of the function to all instruments, we deal not only with multiple functions, but with monitored functions calling other monitored functions, in which case monitor states for multiple functions must be maintained at the same time.

We deal with multiple properties over the same function by adding a unique identifier of a property to each of its instruments. We compute a uniquely identifying string for each property by taking the SHA1 hash of the combination of the quantification sequence and the template. We add this unique identifier to each instrument, giving the monitoring algorithm a way to distinguish properties.

Taking the original triple iB, i∀, iα, the appropriate obs code, and the new requirements for the function name and the property hash, the new form of instruments that are placed by VyPR2 is function, hash, obs, iB, i∀, iα.

#### **4.3 Making Instrumentation Data Persistent**

The tree H<sup>ϕ</sup> is dependent on the CFTL formula ϕ for which it has been computed. Hence, if the specification for a given function in the web service consists of a set ¯ϕ = {ϕ1,...,ϕn} of CFTL formulas, the data required to monitor each property at the same time over the same execution of the given function consists of the set of maps H<sup>ϕ</sup>*<sup>i</sup>* which can be identified by ϕi. In particular, when data is received from an instrument by the monitoring algorithm, we can assume from Sect. 4.2 that it will contain a unique identifier for the formula for which it was placed. Therefore, the correct tree H<sup>ϕ</sup>*<sup>i</sup>* can be determined for each instrument.

We make such instrumentation data persistent by creating new directories in the root of the web service called binding\_spaces and instrumentation\_maps to hold the binding spaces and trees, respectively, computed for each function/CFTL property combination. To dump the binding spaces and hierarchy functions in files in these directories, we use Python's pickle [13] module.

#### **4.4 Activating Verification in a Web Service**

Our infrastructure is designed to minimise intrusion, both by minimising the amount of instrumentation performed and by minimising the amount of code engineers must add to their services for verification to be performed.

With the Flask-based implementation of VyPR2 that we present here, one can *activate* verification by adding the lines from vypr import Verification and verification = Verification(app) where app is the Flask application object required when building a web service with the Flask framework.

Running verification = Verification(app) will start up the separate monitoring thread, similar to VyPR, and will also read the serialised binding spaces and trees from the directories described in Sect. 4.3. It will subsequently place them in a map G from module.function, property hash pairs to objects containing the unserialised forms of the binding spaces and trees.

### **4.5 A Modified Monitoring Algorithm**

VyPR's algorithm uses the tuple iB, i∀, iα with H<sup>ϕ</sup> to determine the set of formula trees to update. In this case, H<sup>ϕ</sup> is fixed. However, in the web service setting, the additional information regarding the current function that has control and the property to update is present and required to find the correct binding space and tree given by G. From here the process is the same as that used by VyPR, since the monitoring problem has once again collapsed to monitoring a single property over a single function.

### **4.6 A Verdict Server**

For a CFTL formula ∀q<sup>1</sup> ∈ Γ1,..., ∀q<sup>n</sup> ∈ Γ<sup>n</sup> : ψ(q1,...,qn) over a function f, we use *verdicts* to refer to the sequence of truth values in ({ , ⊥} ×R≥)∗, where ψ(q1,...,qn) generates a truth value in { , ⊥} for each binding in Γ<sup>1</sup> ×···× <sup>Γ</sup><sup>n</sup> at a time <sup>t</sup> <sup>∈</sup> <sup>R</sup>≥. To store such verdicts from a specification written over a web service, we now present the most substantial modification to VyPR's architecture: a central server to collect verdicts. This is, in itself, a separate system; communication with it takes place via HTTP. It consists of two major components:


We omit further discussion of the server and first state some facts regarding our relational schema. Functions and properties are paired, so multiple properties over a single function yield multiple pairs; HTTP requests are used to group function calls; function calls correspond to function/property pairs; and verdicts are organised into bindings belonging to a function/property pair. With these facts in mind, one can answer questions such as:


### **5 An Application: The CMS Conditions Uploader**

We now present the details of the application of VyPR2 to the CMS Conditions Upload Service. We begin by introducing the data with which the CMS Conditions Upload Service works. We then give a brief overview of the existing performance analysis approaches taken at CERN, before describing our approach for replaying real data from LHC runs. Finally, we give our specification and present an analysis of the verdicts derived by monitoring the Conditions Uploader with input taken from our test data, consisting of in the order of 10<sup>4</sup> inputs recorded during LHC runs.

#### **5.1 Conditions Data, Their Computation and Upload**

CERN is home to the Large Hadron Collider (LHC) [14], the largest and most powerful particle accelerator ever built. At one of the interaction points on the LHC beamline lies the Compact Muon Solenoid (CMS) [15], a general purpose detector which is a composite of sub-detector systems. Physics analysis at CERN requires reconstruction; a process whose input consists of both Event (collisions) and Non-Event (alignment and calibrations, or Conditions) data. The lifecycle of Conditions data begins with its computation during LHC runs, and ends with its upload to a central Conditions database. The service responsible for this upload is the CMS Conditions Upload service, a precise understanding of the performance of which is vital given planned upgrades to the LHC that will increase the amount of data taken.

The Conditions data used in reconstruction by CMS must define (1) the alignment and calibrations constants associated with a particular subdetector of CMS and (2) the time (run of the LHC) during which those constants are valid. The atomic unit of Conditions is the *Payload*, which is a serialised C++ class whose fields are specific to the subdetector of CMS to which the class corresponds. We define when a Payload applies to the subdetector by associating with it an *Interval of Validity* (IOV). We then group IOVs into sequences by defining *Tags*, which define to which subdetector each Payload associated with the IOVs it contains applies.

The CMS Conditions Uploader is used for release of Conditions by the automated Conditions computation that takes place at Tier 0 [16] (CERN's local computing grid) and detector experts who require their own Conditions. The Uploader is responsible for checking whether the Conditions proposed are valid before inserting the Conditions into the central database.

#### **5.2 A Specification**

We now give the specification with which we tested the Upload service on the upload data we collected, along with an interpretation for each property. These were written in collaboration with engineers working on the service.

1. app.usage.Usage.new upload session

∀q ∈ changes(authenticated) : ∀t ∈ future(q, calls(execute)) : - q(authenticated) = True <sup>=</sup><sup>⇒</sup> duration(t) <sup>∈</sup> [0, 1]

*Whenever* authenticated *is changed, if it is set to* True*, then all future calls to* execute *should take no more than 1 second.*

2. app.routes.check hashes

∀q ∈ changes(hashes) : duration(next(q, calls(find new hashes))) ∈ [0, 0.3]

*When the variable* hashes *is assigned, the next call to* find\_new\_hashes *should take no more than 0.3 seconds.*

3. app.routes.store blobs

∀t ∈ calls(con.execute) : duration(t) ∈ [0, 2] *Every call to the* con.execute *method on the current database connection should take no more than 2 seconds.*

4. app.metadata handler.MetadataHandler. init

∀t ∈ calls(insert iovs) : duration next(t, calls(commit)) ∈ [0, 1] *Every time the method* insert\_iovs *is called, the next commit after the insertion should take no more than 1 second.*

5. app.routes.upload metadata


#### **5.3 Analysis of Verdicts**

We present our analysis of the Conditions uploader with respect to the specification in Sect. 5.2. The analysis is performed in two parts:


*Complete Replay.* Figure 5 shows the results of monitoring our specification over a dataset of 14,610 uploads. The x axis is function/property pair IDs from the verdict database snapshot used to generate the plot. The ID to property correspondence is such that ID 99 refers to property 1; ID 100 to property 2; ID 101 to property 3; ID 102 to property 4; and ID 103 to property 5. Clearly, from this plot, the violations of property 2 exceed those caused by other properties by an order of magnitude. The check\_hashes function carries out an optimisation that we call *hash checking*, used to make sure that a Conditions upload only sends the Payloads that are not already in the target Conditions database. This

**Fig. 5.** A plot of number of violations vs properties in the specification, monitored over 14,610 uploads.

**Fig. 6.** A plot of violations of parts of our specification vs the replay of the 900 upload dataset. (Color figure online)

is possible because Payloads are uniquely identifiable by their hashes. This optimisation reduces the time spent on Payload uploads by an order of magnitude [12], but the frequency of violation in Fig. 5 suggests that the optimisation itself may be causing unacceptable latency.

*Single Tag Replay.* Figure 6 shows the results of monitoring a subset of our specification over a dataset of ≈ 900 uploads from a single Tag in the Conditions database. In this case, the x axis is runs of this upload dataset performed with varying delays between uploads, and the y axis is the number of violations based on a specification with 3 properties. This plot is of interest because, for the ≈ 300 Payloads inserted during this replay, it shows that the latency experienced by those insertions (in terms of violations of property 3, shown in orange) decreases as the delay between uploads increases.

#### **5.4 Resulting Investigation**

Based on the observations presented in Sect. 5.3, we have made investigation of the number of violations caused by *hash checking* a priority. It is recognised that this process is required, and its addition to the Conditions Uploader was a significant optimisation, but the optimisation can only be considered as such if it does not introduce unacceptable overhead to the upload process.

It is also clear that we should understand the pattern of violations in Fig. 6 more precisely. Given that the Conditions Uploader must operate successfully with both the current and upgraded LHC, it is a priority to understand the behaviour of the Uploader under varying frequencies of uploads. We suspect that investigation into the pattern seen in Fig. 6 will result in modification of either the Conditions Uploader's code, or the way in which Conditions are sent for upload during LHC runs.

#### **5.5 Performance**

We now describe the time and space overhead induced by using VyPR2 to monitor the specification in Sect. 5.2 over the Conditions Uploader. We consider both the time overhead on a single upload, and the space required to store intermediate instrumentation data.

To measure the time overhead induced over a single upload, we found that measuring overhead by running our complete upload dataset in a small period of time resulted in erratic database latency (the dataset was recorded over 7 months), so we opted to run a single upload 10 times with and without monitoring. This provided a more realistic upload scenario, and allowed us to see the overhead induced with respect to a single upload process (the process varies depending on the Conditions being uploaded). The result, from 10 runs of the same upload, was an average time overhead of 4.7%. Uploads are performed by a client sending the Conditions to the upload server over multiple HTTP requests, so this overhead is measured starting from when the first request is received by the upload server to when the last response is sent.

The space required to store all of the necessary instrumentation data for the specification in Sect. 5.2 is divided into space for *binding spaces* (Bϕ), *instrumentation maps* (Hϕ) and indices (a map from property hashes to the position in the specification at which they are found). The binding spaces took up 170 KB, the instrumentation maps 173 KB and the index map 4.3 KB, giving a total space overhead for instrumentation data storage of 347.3 KB.

### **6 Related Work**

To the best of our knowledge, there is no existing work on Runtime Verification of web services. We are also unaware of other (available and maintained) RV tools for Python (there is Nagini [17], but this focuses on static verification) as most either operate offline (on log files) or focus on other languages such as Java [5,7,18] using AspectJ for instrumentation, C [19], or Erlang [20]. Few RV tools consider the instrumentation problem within the tool. The main exception is Java-MaC [3] who also use the specification to rewrite the Java code directly.

*High-Energy Physics.* In High Energy Physics, any form of monitoring concentrates on instrumentation in order to carry out manual inspection. For example, the instrumentation and subsequent monitoring of CMS' PhEDEx system for transfer of physics data was performed [21] and resulted in the identification of areas in which latency could be improved. Closer to the case study we present here, CMS uses the pclMon tool to monitor Conditions computation [22]. Finally, the Frontier query caching system performs offline monitoring by analysing logs [23]. None of these approaches uses a formal specification language, and they all collect a single type of statistics for a single defined use case. On the contrary, VyPR2 is *configurable* in the sense that one can change the specification being checked using our formal specification language, CFTL.

### **7 Conclusion**

We have introduced the VyPR tool for monitoring single-threaded Python programs with respect to CFTL specifications, expressed using the PyCFTL library for Python. We then highlighted the problems that one must solve to extend VyPR's architecture to the web service setting, and presented the VyPR2 framework which implements our solutions. VyPR2 is a complete Runtime Verification framework for Flask-based web services written in Python; it provides the PyCFTL library for writing CFTL specifications over an entire web service, automatic minimal (with respect to reachability) instrumentation and efficient monitoring. Finally, we have described our experience using VyPR2 to analyse performance of the CMS Conditions Uploader, a critical part of the physics reconstruction pipeline of the CMS Experiment at CERN.

With the large amount of test data we have at CERN, we plan to extend VyPR2 to address explanation of violations of any part of a specification. This has been agreed within the CMS Experiment as being a significant step in developing the necessary software analysis tools ready for the upgraded LHC.

#### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Constraint-Based Monitoring of Hyperproperties**

Christopher Hahn , Marvin Stenger(B) , and Leander Tentrup

Reactive Systems Group, Saarland University, Saarbr¨ucken, Germany {hahn,stenger,tentrup}@react.uni-saarland.de

**Abstract.** Verifying hyperproperties at runtime is a challenging problem as hyperproperties, such as non-interference and observational determinism, relate multiple computation traces with each other. It is necessary to store previously seen traces, because every new incoming trace needs to be compatible with every run of the system observed so far. Furthermore, the new incoming trace poses requirements on *future* traces. In our monitoring approach, we focus on those requirements by rewriting a hyperproperty in the temporal logic HyperLTL to a Boolean constraint system. A hyperproperty is then violated by multiple runs of the system if the constraint system becomes unsatisfiable. We compare our implementation, which utilizes either BDDs or a SAT solver to store and evaluate constraints, to the automata-based monitoring tool RVHyper.

**Keywords:** Monitoring · Rewriting · Constraint-based · Hyperproperties

### **1 Introduction**

As today's complex and large-scale systems are usually far beyond the scope of classic verification techniques like model checking or theorem proving, we are in the need of light-weight monitors for controlling the flow of information. By instrumenting efficient monitoring techniques in such systems that operate in an unpredictable privacy-critical environment, countermeasures will be enacted before irreparable information leaks happen. Information-flow policies, however, cannot be monitored with standard runtime verification techniques as they relate *multiple* runs of a system. For example, *observational determinism* [19,21,24] is a policy stating that altering non-observable input has no impact on the observable behavior. Hyperproperties [7] are a generalization of trace properties and are thus capable of expressing information-flow policies. HyperLTL [6] is a recently introduced temporal logic for hyperproperties,

This work was partially supported by the German Research Foundation (DFG) as part of the Collaborative Research Center "Methods and Tools for Understanding and Controlling Privacy" (CRC 1223) and the Collaborative Research Center "Foundations of Perspicuous Software Systems" (CRC 248), and by the European Research Council (ERC) Grant OSARES (No. 683300).

c The Author(s) 2019 T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 115–131, 2019. https://doi.org/10.1007/978-3-030-17465-1\_7

which extends Linear-time Temporal Logic (LTL) [20] with trace variables and explicit trace quantification. Observational determinism is expressed as the formula ∀π, π .(*out*<sup>π</sup> ↔ *out*π- ) <sup>W</sup>(*in*<sup>π</sup> *in*π- ), stating that all traces π, π should agree on the output as long as they agree on the inputs.

In contrast to classic trace property monitoring, where a single run suffices to determine a violation, in runtime verification of HyperLTL formulas, we are concerned whether a *set* of runs through a system violates a given specification. In the common setting, those runs are given sequentially to the runtime monitor [1,2,12,13], which determines if the given set of runs violates the specification. An alternative view on HyperLTL monitoring is that every new incoming trace poses requirements on future traces. For example, the event {*in*, *out*} in the observational determinism example above asserts that for every other trace, the output *out* has to be enabled if *in* is enabled. Approaches based on static automata constructions [1,12,13] perform very well on this type of specifications, although their scalability is intrinsically limited by certain parameters: The automaton construction becomes a bottleneck for more complex specifications, especially with respect to the number of atomic propositions. Furthermore, the computational workload grows steadily with the number of incoming traces, as every trace seen so far has to be checked against every new trace. Even optimizations [12], which minimize the amount of traces that must be stored, turn out to be too coarse grained as the following example shows. Consider the monitoring of the HyperLTL formula ∀π, π . (a<sup>π</sup> → ¬b<sup>π</sup>- ), which states that globally if a occurs on any trace π, then b is not allowed to hold on any trace π , on the following incoming traces:


In prior work [12], we observed that traces, which pose *less requirements* on future traces, can safely be discarded from the monitoring process. In the example above, the requirements of trace 1 are dominated by the requirements of trace 2, namely that b is not allowed to hold on the first and second position of new incoming traces. Hence, trace 1 must not longer be stored in order to detect a violation. But with the proposed language inclusion check in [12], neither trace 2 nor trace 3 can be discarded, as they pose incomparable requirements. They have, however, overlapping constraints, that is, they both enforce ¬b in the first step.

To further improve the conciseness of the stored traces information, we use *rewriting*, which is a more fine-grained monitoring approach. The basic idea is to track the requirements that future traces have to fulfill, instead of storing a set of traces. In the example above, we would track the requirement that b is not allowed to hold on the first three positions of every freshly incoming trace. Rewriting has been applied successfully to trace properties, namely LTL formulas [17]. The idea is to partially evaluate a given LTL specification ϕ on an incoming event by unrolling ϕ according to the expansion laws of the temporal operators. The result of a single rewrite is again an LTL formula representing the updated specification, which the continuing execution has to satisfy. We use rewriting techniques to reduce <sup>∀</sup><sup>2</sup>HyperLTL formulas to LTL constraints and check those constraints for inconsistencies corresponding to violations.

In this paper, we introduce a complete and provably correct rewritingbased monitoring approach for <sup>∀</sup><sup>2</sup>HyperLTL formulas. Our algorithm rewrites a HyperLTL formula and a single event into a constraint composed of plain LTL and HyperLTL. For example, assume the event {*in*, *out*} while monitoring observational determinism formalized above. The first step of the rewriting applies the expansion laws for the temporal operators, which results in (in<sup>π</sup> in<sup>π</sup>- ) ∨ (out<sup>π</sup> ↔ out<sup>π</sup>- ) ∧ ((out<sup>π</sup> ↔ out<sup>π</sup>- ) <sup>W</sup>(in<sup>π</sup> in<sup>π</sup>- )). The event {in, out} is rewritten for atomic propositions indexed by the trace variable π. This means replacing each occurrence of in or out in the current expansion step, i.e., before the operator, with . Additionally, we strip the π trace quantifier in the current expansion step from all other atomic propositions. This leaves us with ( in) ∨ ( ↔ out) ∧ ((out<sup>π</sup> ↔ out<sup>π</sup>- ) <sup>W</sup>(in<sup>π</sup> in<sup>π</sup>- )). After simplification we have ¬in ∨ out ∧ ((out<sup>π</sup> ↔ out<sup>π</sup>- ) <sup>W</sup>(in<sup>π</sup> in<sup>π</sup>- )) as the new specification, which consists of a plain LTL part and a HyperLTL part. Based on this, we incrementally build a Boolean constraint system: we start by encoding the constraints corresponding to the LTL part and encode the HyperLTL part as variables. Those variables will then be incrementally defined when more elements of the trace become available. With this approach, we solely store the necessary information needed to detect violations of a given hyperproperty.

We evaluate two implementations of our approach, based on BDDs and SATsolving, against RVHyper [13], a highly optimized automaton-based monitoring tool for temporal hyperproperties. Our experiments show that the rewriting approach performs equally well in general and better on a class of formulas which we call *guarded invariants*, i.e., formulas that define a certain invariant relation between two traces.

**Related Work.** With the need to express temporal hyperproperties in a succinct and formal manner, the above mentioned temporal logics HyperLTL and HyperCTL\* [6] have been proposed. The model-checking [6,14,15], satisfiability [9], and realizability problem [10] of HyperLTL has been studied before.

Runtime verification of HyperLTL formulas was first considered for (co-)ksafety hyperproperties [1]. In the same paper, the notion of monitorability for HyperLTL was introduced. The authors have also identified syntactic classes of HyperLTL formulas that are monitorable and they proposed a monitoring algorithm based on a progression logic expressing trace interdependencies and the composition of an LTL<sup>3</sup> monitor.

Another automata-based approach for monitoring HyperLTL formulas was proposed in [12]. Given a HyperLTL specification, the algorithm starts by creating a deterministic monitor automaton. For every incoming trace it is then checked that all combinations with the already seen traces are accepted by the automaton. In order to minimize the number of stored traces, a languageinclusion-based algorithm is proposed, which allows to prune traces with redundant information. Furthermore, a method to reduce the number of combination of traces which have to get checked by analyzing the specification for relations such as reflexivity, symmetry, and transitivity with a HyperLTL-SAT solver [9,11], is proposed. The algorithm is implemented in the tool RVHyper [13], which was used to monitor information-flow policies and to detect spurious dependencies in hardware designs.

Another rewriting-based monitoring approach for HyperLTL is outlined in [5]. The idea is to identify a set of propositions of interest and aggregate constraints such that inconsistencies in the constraints indicate a violation of the HyperLTL formula. While the paper describes the building blocks for such a monitoring approach with a number of examples, we have, unfortunately, not been successful in applying the algorithm to other hyperproperties of interest, such as observational determinism.

In [3], the authors study the complexity of monitoring hyperproperties. They show that the form and size of the input, as well as the formula have a significant impact on the feasibility of the monitoring process. They differentiate between several input forms and study their complexity: a set of linear traces, tree-shaped Kripke structures, and acyclic Kripke structures. For acyclic structures and alternation-free HyperLTL formulas, the problems complexity gets as low as NC.

In [4], the authors discuss examples where static analysis can be combined with runtime verification techniques to monitor HyperLTL formulas beyond the alternation-free fragment. They discuss the challenges in monitoring formulas beyond this fragment and lay the foundations towards a general method.

### **2 Preliminaries**

Let *AP* be a finite set of *atomic propositions* and let Σ = 2*AP* be the corresponding *alphabet*. An infinite *trace* <sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> is an infinite sequence over the alphabet. A subset <sup>T</sup> <sup>⊆</sup> <sup>Σ</sup><sup>ω</sup> is called a *trace property*. A *hyperproperty* <sup>H</sup> <sup>⊆</sup> <sup>2</sup>(Σω) is a generalization of a trace property. A finite trace <sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> is a finite sequence over Σ. In the case of finite traces, |t| denotes the length of a trace. We use the following notation to access and manipulate traces: Let t be a trace and i be a natural number. t[i] denotes the i-th element of t. Therefore, t[0] represents the first element of the trace. Let j be natural number. If j ≥ i and i ≥ |t|, then t[i, j] denotes the sequence t[i]t[i + 1] ···t[min(j, |t| − 1)]. Otherwise it denotes the empty trace . t[i denotes the suffix of t starting at position i. For two finite traces s and t, we denote their concatenation by s · t.

**HyperLTL Syntax.** HyperLTL [6] extends LTL with trace variables and trace quantifiers. Let V be a finite set of trace variables. The syntax of HyperLTL is given by the grammar

$$\begin{aligned} \varphi &:= \, \forall \pi. \,\varphi \mid \exists \pi. \,\varphi \mid \psi\\ \psi &:= \, a\_{\pi} \mid \psi \land \psi \mid \neg \psi \mid \mathsf{O} \,\psi \mid \psi \,\mathcal{U} \,\psi, \end{aligned}$$

where a ∈ *AP* is an atomic proposition and π ∈ V is a trace variable. Atomic propositions are indexed by trace variables. The explicit trace quantification enables us to express properties like "on all traces ϕ must hold", expressed by ∀π.ϕ. Dually, we can express "there exists a trace such that ϕ holds", expressed by ∃π.ϕ. We use the standard derived operators *release* ϕ R ψ := ¬(¬ϕ U ¬ψ), *eventually* ϕ := *true* U ϕ, *globally* ϕ := ¬ ¬ϕ, and *weak until* ϕ<sup>1</sup> W ϕ<sup>2</sup> := (ϕ<sup>1</sup> U ϕ2) ∨ ϕ1. As we use the finite trace semantics, ϕ denotes the *strong* version of the next operator, i.e., if a trace ends before the satisfaction of ϕ can be determined, the satisfaction relation, defined below, evaluates to false. To enable duality in the finite trace setting, we additionally use the *weak* next operator ϕ which evaluates to true if a trace ends before the satisfaction of ϕ can be determined and is defined as ϕ := ¬ ¬ϕ. We call ψ of a HyperLTL formula *Q*.ψ, with an arbitrary quantifier prefix *Q*, the *body* of the formula. A HyperLTL formula *Q*.ψ is in the *alternation-free fragment* if either *Q* consists solely of universal quantifiers or solely of existential quantifiers. We also denote the respective alternation-free fragments as the <sup>∀</sup><sup>n</sup> fragment and the <sup>∃</sup><sup>n</sup> fragment, with n being the number of quantifiers in the prefix.

**Finite Trace Semantics.** We recap the finite trace semantics for HyperLTL [5] which is itself based on the finite trace semantics of LTL [18]. In the following, when using L(ϕ) we refer to the finite trace semantics of a HyperLTL formula <sup>ϕ</sup>. Let <sup>Π</sup>*fin* : V → <sup>Σ</sup><sup>+</sup> be a partial function mapping trace variables to finite traces. We define [0] as the empty set. Π*fin*[i denotes the trace assignment that is equal to Π*fin*(π)[i for all π ∈ dom(Π*fin*). By slight abuse of notation, we write t ∈ Π*fin* to access traces t in the image of Π*fin*. The satisfaction of a HyperLTL formula ϕ over a finite trace assignment Π*fin* and a set of finite traces T, denoted by Π*fin* -<sup>T</sup> ϕ, is defined as follows:

$$\begin{array}{lll} \Pi\_{\text{fin}} \models\_{T} a\_{\pi} & \text{if } a \in \Pi\_{\text{fin}}(\pi)[0] \\ \Pi\_{\text{fin}} \models\_{T} \neg \varphi & \text{if } \Pi\_{\text{fin}} \not\models\_{T} \varphi \\ \Pi\_{\text{fin}} \models\_{T} \varphi \vee \psi & \text{if } \Pi\_{\text{fin}} \models\_{T} \varphi \text{ or } \Pi\_{\text{fin}} \models\_{T} \psi \\ \Pi\_{\text{fin}} \models\_{T} \mathsf{O} \varphi & \text{if } \forall t \in \Pi\_{\text{fin}}, |t| > 1 \text{ and } \Pi\_{\text{fin}}[1] \models\_{T} \varphi \\ \Pi\_{\text{fin}} \models\_{T} \varphi \mathsf{U} \psi & \text{if } \exists i < \min\_{t \in \Pi\_{\text{fin}}} |t|. \Pi\_{\text{fin}}[i] \models\_{T} \psi \wedge \forall j < i. \newline \Pi\_{\text{fin}}[j] \models\_{T} \varphi \\ \Pi\_{\text{fin}} \models\_{T} \exists \pi. \varphi & \text{if there is some } t \in T \text{ such that } \Pi\_{\text{fin}}[\pi \mapsto t] \models\_{T} \varphi \\ \Pi\_{\text{fin}} \models\_{T} \forall \pi. \varphi & \text{if for all } t \in T \text{ such that } \Pi\_{\text{fin}}[\pi \mapsto t] \models\_{T} \varphi \end{array}$$

Due to duality of U / R, / , ∃/∀, and the standard Boolean operators, every HyperLTL formula ϕ can be transformed into negation normal form (NNF), i.e., for every ϕ there is some ψ in negation normal form such that for all Π*fin* and T it holds that Π*fin* -<sup>T</sup> ϕ if, and only if, Π*fin* -<sup>T</sup> ψ. The standard LTL semantic, written t -LTL*fin* ϕ, for some LTL formula ϕ is equal to {π → t}*fin* -<sup>∅</sup> ϕ , where ϕ is derived from ϕ by replacing every proposition p ∈ *AP* by pπ.

### **3 Rewriting HyperLTL**

Given the body <sup>ϕ</sup> of a <sup>∀</sup><sup>2</sup>HyperLTL formula <sup>∀</sup>π, π . ϕ, and a finite trace <sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>, we define alternative language characterizations. These capture the intuitive idea that, if one fixes a finite trace t, the language of ∀π, π . ϕ includes exactly those traces t that satisfy ϕ in conjunction with t.

$$\begin{array}{lcl} \mathcal{L}\_t^\pi(\varphi) &:= \left\{ t' \in \Sigma^+ \mid \{ \pi \mapsto t, \pi' \mapsto t' \}\_{\operatorname{fin}} \models \varphi \right\} \\ \mathcal{L}\_t^{\pi'}(\varphi) &:= \left\{ t' \in \Sigma^+ \mid \{ \pi \mapsto t', \pi' \mapsto t \}\_{\operatorname{fin}} \models \varphi \right\} \\ \mathcal{L}\_t(\varphi) &:= \mathcal{L}\_t^\pi(\varphi) \cap \mathcal{L}\_t^{\pi'}(\varphi) \end{array}$$

We call ˆϕ := ϕ ∧ ϕ[π /π, π/π ] the symmetric closure of ϕ, where ϕ[π /π, π/π ] represents the expression ϕ in which the trace variables π, π are swapped. The language of the symmetric closure, when fixing one trace variable, is equivalent to the language of ϕ.

**Lemma 1.** *Given the body* <sup>ϕ</sup> *of a* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ*, and a finite trace* <sup>t</sup> <sup>∈</sup> <sup>Σ</sup>+*, it holds that* <sup>L</sup><sup>π</sup> <sup>t</sup> ( ˆϕ) = Lt(ϕ)*.*

$$\begin{split} & \text{Proof.} \\ & \mathcal{L}\_{t}^{\pi}(\hat{\varphi}) = \left\{ t' \in \Sigma^{+} \,|\, \{\pi \mapsto t, \pi' \mapsto t'\}\_{\hat{\pi}n} \models \hat{\varphi} \right\} \\ & \qquad = \left\{ t' \in \Sigma^{+} \,|\, \{\pi \mapsto t, \pi' \mapsto t'\}\_{\hat{\pi}n} \models \varphi \wedge \varphi \,|\pi' \langle \pi, \pi/\pi' \rangle \right\} \\ & \qquad = \left\{ t' \in \Sigma^{+} \,|\, \{\pi \mapsto t, \pi' \mapsto t'\}\_{\hat{\pi}n} \models \varphi, \{\pi \mapsto t, \pi' \mapsto t'\}\_{\hat{\pi}n} \models \varphi \,|\pi' \langle \pi, \pi/\pi' \rangle \right\} \\ & \qquad = \left\{ t' \in \Sigma^{+} \,|\, \{\pi \mapsto t, \pi' \mapsto t'\}\_{\hat{\pi}n} \models \varphi, \{\pi \mapsto t', \pi' \mapsto t\}\_{\hat{\pi}n} \models \varphi \right\} = \mathcal{L}\_{t}(\varphi) \end{split}$$

We exploit this to rewrite a <sup>∀</sup><sup>2</sup>HyperLTL formula into an LTL formula. We define the projection ϕ| π <sup>t</sup> of the body <sup>ϕ</sup> of a <sup>∀</sup><sup>2</sup>HyperLTL formula <sup>∀</sup>π, π . ϕ in NNF and a finite trace <sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> to an LTL formula recursively on the structure of <sup>ϕ</sup>:

aπ| π <sup>t</sup> := if a ∈ t[0] <sup>⊥</sup> otherwise <sup>¬</sup>aπ<sup>|</sup> π <sup>t</sup> := if a /∈ t[0] ⊥ otherwise a<sup>π</sup>- | π <sup>t</sup> := a ¬a<sup>π</sup>- | π <sup>t</sup> := ¬a (ϕ ∨ ψ)| π <sup>t</sup> := ϕ| π <sup>t</sup> ∨ ψ| π <sup>t</sup> (ϕ ∧ ψ)| π <sup>t</sup> := ϕ| π <sup>t</sup> ∧ ψ| π t ( ϕ)| π <sup>t</sup> := ⊥ if |t| ≤ 1 ϕ| π <sup>t</sup>[1 otherwise ( ϕ)| π <sup>t</sup> := if |t| ≤ 1 ϕ| π <sup>t</sup>[1 otherwise (ϕ U ψ)| π <sup>t</sup> := ψ| π <sup>t</sup> if |t| ≤ 1 ψ| π <sup>t</sup> ∨ (ϕ| π <sup>t</sup> ∧ ((ϕ U ψ)| π <sup>t</sup>[1)) otherwise (ϕ R ψ)| π <sup>t</sup> := ψ| π <sup>t</sup> if |t| ≤ 1 ψ| π <sup>t</sup> ∧ (ϕ| π <sup>t</sup> ∨ ((ϕ R ψ)| π <sup>t</sup>[1)) otherwise

**Theorem 1.** *Given a* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ *and any two finite traces* t, t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *it holds that* <sup>t</sup> ∈ L<sup>π</sup> <sup>t</sup> (ϕ) *if, and only if* t -*LTLfin* ϕ| π t *.*

*Proof.* By induction on the size of t. Induction Base (t = e, where e ∈ Σ): Let t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> be arbitrarily chosen. We distinguish by structural induction the following cases over the formula ϕ. We begin with the base cases.


The structural induction hypothesis states that ∀t <sup>∈</sup> <sup>Σ</sup>+. t ∈ L<sup>π</sup> <sup>t</sup> (ψ) ⇔ t -LTL*fin* ψ| π <sup>t</sup> (SIH1), where ψ is a strict subformula of ϕ.


Induction Step (t = e·t <sup>∗</sup>, where <sup>e</sup> <sup>∈</sup> Σ, t<sup>∗</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>): The induction hypothesis states that ∀t <sup>∈</sup> <sup>Σ</sup>+. t ∈ L<sup>π</sup> <sup>t</sup><sup>∗</sup> (ϕ) ⇔ t -LTL*fin* ϕ| π <sup>t</sup><sup>∗</sup> (IH). We make use of structural induction over ϕ. All cases without temporal operators are covered as their proofs above were independent of |t|. The structural induction hypothesis states for all strict subformulas ψ that ∀t <sup>∈</sup> <sup>Σ</sup>+. t ∈ L<sup>π</sup> <sup>t</sup> (ψ) ⇔ t -LTL*fin* ψ| π <sup>t</sup> (SIH2).


#### **4 Constraint-Based Monitoring**

For monitoring, we need to define an *incremental* rewriting that accurately models the semantics of ϕ| π <sup>t</sup> while still being able to detect violations early. To this end, we define an operation ϕ[π, e, i], where e ∈ Σ is an event and i is the current position in the trace. ϕ[π, e, i] transforms ϕ into a propositional formula, where the variables are either indexed atomic propositions p<sup>i</sup> for p ∈ *AP*, or a variable v<sup>−</sup> ϕ-,i+1 and <sup>v</sup><sup>+</sup> ϕ-,i+1 that act as placeholders until new information about the trace comes in. Whenever the next event e occurs, the variables are defined with the result of ϕ [π, e , i + 1]. If the trace ends, the variables are set to *true* and *false* for <sup>v</sup><sup>+</sup> and <sup>v</sup>−, respectively. We define <sup>ϕ</sup>[π, e, i] of a <sup>∀</sup><sup>2</sup>HyperLTL formula ∀π, π . ϕ in NNF, event e ∈ Σ, and i ≥ 0 recursively on the structure of the body ϕ:

$$\begin{array}{lcl} a\_{\pi}[\pi,e,i] &:= \begin{cases} \top & \text{if } a \in e\\ \bot & \text{otherwise} \end{cases} & (\neg a\_{\pi})[\pi,e,i] &:= \begin{cases} \top & \text{if } a \notin e\\ \bot & \text{otherwise} \end{cases} \\ a\_{\pi'}[\pi,e,i] &:= a\_{i} & (\neg a\_{\pi'})[\pi,e,i] &:= \neg a\_{i} \\ (\varphi\vee\psi)[\pi,e,i] &:= \varphi[\pi,e,i]\vee\psi[\pi,e,i] \left(\varphi\wedge\psi\right)[\pi,e,i] &:= \varphi[\pi,e,i]\wedge\psi[\pi,e,i] \\ (\mathsf{O}\varphi)[\pi,e,i] &:= v\_{\varphi,i+1}^{-} & (\mathsf{O}\varphi)[\pi,e,i] &:= v\_{\varphi,i+1}^{+} \end{cases}$$
 
$$\begin{array}{l} (\varphi\mathcal{U}\psi)[\pi,e,i] &:= \psi[\pi,e,i]\vee(\varphi[\pi,e,i]\wedge v\_{\varphi\mathcal{U}\psi,i+1}^{-}) \\ (\varphi\mathcal{R}\,\psi)[\pi,e,i] &:= \psi[\pi,e,i]\wedge(\varphi[\pi,e,i]\wedge v\_{\varphi\mathcal{R}\psi,i+1}^{+}) \end{array}$$

We encode a <sup>∀</sup><sup>2</sup>HyperLTL formula and finite traces into a constraint system, which, as we will show, is satisfiable if and only if the given traces satisfy the formula w.r.t. the finite semantics of HyperLTL. We write vϕ,i to denote either v<sup>−</sup> ϕ,i or v<sup>+</sup> ϕ,i. For e ∈ Σ and t ∈ Σ∗, we define

$$\begin{array}{lcl} constr(v\_{\varphi,i}^{+},\epsilon) &:= \top \\ constr(v\_{\varphi,i}^{-},\epsilon) &:= \bot \\ constr(v\_{\varphi,i},e\cdot t) &:= \varphi[\pi,e,i] \land \bigwedge\_{v\_{\psi,i+1}\in\varphi[\pi,e,i]} \left(v\_{\psi,i+1}\rightarrow \mathit{constraint}(v\_{\psi,i+1},t)\right) \\ \mathit{enc}\_{\mathrm{AP}}^{i}(\epsilon) &:= \top \\ enc\_{\mathrm{AP}}^{i}(e\cdot t) &:= \bigwedge\_{a\in\mathrm{AP}\cap e} a\_{i} \quad \wedge \bigwedge\_{a\in\mathrm{AP}\backslash e} \neg a\_{i} \quad \wedge \quad enc\_{\mathrm{AP}}^{i+1}(t), \end{array}$$

where we use vψ,i+1 ∈ ϕ[π, e, i] to denote variables vψ,i+1 occurring in the propositional formula ϕ[π, e, i]. *enc* is used to transform a trace into a propositional formula, e.g., enc<sup>0</sup> {a,b}({a}{a, b}) = <sup>a</sup><sup>0</sup> ∧ ¬b<sup>0</sup> <sup>∧</sup> <sup>a</sup><sup>1</sup> <sup>∧</sup> <sup>b</sup>1. For <sup>n</sup> = 0 we omit the annotation, i.e., we write encAP(t) instead of enc<sup>0</sup> AP(t). Also we omit the index AP if it is clear from the context. By slight abuse of notation, we use constr<sup>n</sup>(ϕ, t) for some quantifier free HyperLTL formula ϕ to denote constr(vϕ,n, t) if |t| > 0. For a trace t <sup>∈</sup> <sup>Σ</sup><sup>+</sup>, we use the notation enc(<sup>t</sup> ) constr(ϕ, t), which evaluates to *true* if, and only if enc(t ) ∧ constr(ϕ, t) is satisfiable.

#### **4.1 Algorithm**

Figure 1 depicts our constraint-based algorithm. Note that this algorithm can be used in an offline and online fashion. Before we give algorithmic details, consider again, the observational determinism example from the introduction, which is expressed as <sup>∀</sup><sup>2</sup>HyperLTL formula <sup>∀</sup>π, π .(out<sup>π</sup> ↔ out<sup>π</sup>- ) <sup>W</sup>(in<sup>π</sup> - in<sup>π</sup>- ). The basic idea of the algorithm is to transform the HyperLTL formula to a formula consisting partially of LTL, which expresses the requirements of the incoming trace in the current step, and partially of HyperLTL. Assuming the event {in, out}, we transform the observational determinism formula to the following formula: ¬in ∨ out ∧ ((out<sup>π</sup> ↔ out<sup>π</sup>- ) <sup>W</sup>(in<sup>π</sup> in<sup>π</sup>-)).

**Input** : <sup>∀</sup>π, π- . ϕ, <sup>T</sup> <sup>⊆</sup> <sup>Σ</sup><sup>+</sup> **Output**: *violation* or *no violation* **<sup>1</sup>** ψ := nnf( ˆϕ) **<sup>2</sup>** C := **<sup>3</sup> foreach** <sup>t</sup> <sup>∈</sup> <sup>T</sup> **do <sup>4</sup>** C<sup>t</sup> := vψ,<sup>0</sup> **<sup>5</sup>** t*enc* := **<sup>6</sup> while** e<sup>i</sup> := *getNextEvent*(t) **do <sup>7</sup>** <sup>t</sup>*enc* := <sup>t</sup>*enc* <sup>∧</sup> enc<sup>i</sup> (ei) **<sup>8</sup> foreach** <sup>v</sup>φ,i <sup>∈</sup> <sup>C</sup><sup>t</sup> **do <sup>9</sup>** c := φ[π, ei, i] **<sup>10</sup>** C<sup>t</sup> := C<sup>t</sup> ∧ (vφ,i → c) **<sup>11</sup> if** <sup>¬</sup>*sat*(<sup>C</sup> <sup>∧</sup> <sup>C</sup><sup>t</sup> <sup>∧</sup> <sup>t</sup>*enc*) **then <sup>12</sup> return** *violation* **<sup>13</sup> foreach** v<sup>+</sup> φ,i+1 <sup>∈</sup> <sup>C</sup><sup>t</sup> **do <sup>14</sup>** <sup>C</sup><sup>t</sup> := <sup>C</sup><sup>t</sup> <sup>∧</sup> <sup>v</sup><sup>+</sup> φ,i+1 **<sup>15</sup> foreach** v<sup>−</sup> φ,i+1 <sup>∈</sup> <sup>C</sup><sup>t</sup> **do <sup>16</sup>** C<sup>t</sup> := C<sup>t</sup> ∧ ¬v<sup>−</sup> φ,i+1 **<sup>17</sup>** C := C ∧ C<sup>t</sup> **<sup>18</sup> return** *no violation*

**Fig. 1.** Constraint-based algorithm for monitoring <sup>∀</sup><sup>2</sup>HyperLTL formulas.

A Boolean constraint system is then build incrementally: we start encoding the constraints corresponding to the LTL part (in front of the next-operator) and encode the Hyper-LTL part (after the next-operator) as variables that are defined when more events of the trace come in. We continue by explaining the algorithm in detail. In line 1, we construct ψ as the negation normal form of the symmetric closure of the original formula. We build two constraint systems: C containing constraints of previous traces and C<sup>t</sup> (built incrementally) containing the constraints for the current trace t. Consequently, we initialize C with and C<sup>t</sup> with vψ,<sup>0</sup> (lines 2 and 4). If the trace ends, we define the remaining v variables according to their polarities and add C<sup>t</sup> to C. For each new event e<sup>i</sup> in the trace t, and each "open" constraint in C<sup>t</sup> corresponding to step i, i.e., vφ,i ∈ Ct, we rewrite the formula φ (line 9) and define vφ,i with the rewriting result, which, potentially

introduced new open constraints v<sup>φ</sup>-,i+1 for the next step i + 1. The constraint encoding of the current trace is aggregated in constraint t*enc* (line 7). If the constraint system given the encoding of the current trace turns out to be unsatisfiable, a violation to the specification is detected, which is then returned.

In the following, we sketch two algorithmic improvements. First, instead of storing the constraints corresponding to traces individually, we use a new data structure, which is a *tree maintaining nodes* of formulas, their corresponding variables and also child nodes. Such a node corresponds to already seen rewrites. The initial node captures the (transformed) specification (similar to line 4) and it is also the root of the tree structure, representing all the generated constraints which replaces C in Fig. 1. Whenever a trace deviates in its rewrite result a new child or branch is added to the tree. If a rewrite result is already present in the node tree structure there is no need to create any new constraints nor new variables. This is crucial in case we observe many equal traces or traces behaving effectively the same. In case no new constraints were added to the constraint system, we omit a superfluous check for satisfiability.

Second, we use *conjunct splitting* to utilize the node tree optimization even more. We illustrate the basic idea on an example. Consider ∀π, π . ϕ with ϕ =

((a<sup>π</sup> ↔ a <sup>π</sup>)∨(b<sup>π</sup> ↔ b <sup>π</sup>)), which demands that on all executions on each position at least on of propositions a or b agree in its evaluation. Consider the two traces t<sup>1</sup> = {a}{a}{a}, t<sup>2</sup> = {a}{a, b}{a} that satisfy the specification. As both traces feature the same first event, they also share the same rewrite result for the first position. Interestingly, on the second position, we get (a ∨ ¬b) ∧ s<sup>ϕ</sup> for t<sup>1</sup> and (a ∨ b) ∧ s<sup>ϕ</sup> for t<sup>2</sup> as the rewrite results. While these constraints are no longer equal, by the nature of invariants, both feature the same subterm on the right hand side of the conjunction. We split the resulting constraint on its syntactic structure, such that we would no longer have to introduce a branch in the tree.

### **4.2 Correctness**

In this technical subsection, we will formally prove correctness of our algorithm by showing that our incremental construction of the Boolean constraints is equisatisfiable to the HyperLTL rewriting presented in Sect. 3. We begin by showing that satisfiability is preserved when shifting the indices, as stated by the following lemma.

**Lemma 2.** *For any* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ *over atomic propositions AP, any finite traces* t, t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *and* <sup>n</sup> <sup>≥</sup> <sup>0</sup> *it holds that* enc*AP*(<sup>t</sup> ) - constr(ϕ, t) <sup>⇔</sup> enc<sup>n</sup> *AP*(t ) constr<sup>n</sup>(ϕ, t)*.*

*Proof.* By renaming of the positional indices.

In the following lemma and corollary, we show that the semantics of the next operators matches the finite LTL semantics.

**Lemma 3.** *For any* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ *over atomic propositions AP and any finite traces* t, t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *it holds that* enc(<sup>t</sup> ) constr( ϕ, t) ⇔ enc(t ) - constr(v<sup>−</sup> ϕ,1, t[1) ⇔ enc(t [1) constr(v<sup>−</sup> ϕ,0, t[1).

*Proof.* Let ϕ, t, t be given. It holds that constr( ϕ, t) = constr(v<sup>−</sup> ϕ,<sup>1</sup>, t[1) by definition. As constr(v<sup>−</sup> ϕ,1, t[1) by construction does not contain any variables with positional index 0, we only need to check satisfiability with respect to enc(t [1). Thus enc(t ) constr( ϕ, t) ⇔ enc(t ) constr(v<sup>−</sup> ψ,<sup>1</sup>, t[1) ⇔ enc<sup>1</sup>(t [1) constr(v<sup>−</sup> ϕ,<sup>1</sup>, t[1) Lem2 ⇐==⇒ enc(t [1) constr(v<sup>−</sup> ϕ,<sup>0</sup>, t[1).

**Corollary 1.** *For any* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ *over atomic propositions AP and any finite traces* t, t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *it holds that* enc(<sup>t</sup> ) constr( ϕ, t) ⇔ enc(t ) constr(v<sup>+</sup> ϕ,<sup>1</sup>, t[1) ⇔ enc(t [1) constr(v<sup>+</sup> ϕ,<sup>0</sup>, t[1).

We will now state the correctness theorem, namely that our algorithm preserves the HyperLTL rewriting semantics.

**Theorem 2.** *For every* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ *in negation normal form over atomic propositions AP and any finite trace* <sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *it holds that* <sup>∀</sup><sup>t</sup> ∈ Σ<sup>+</sup>. t -*LTLfin* ϕ| π <sup>t</sup> ⇔ enc*AP*(t ) constr(ϕ, t)*.*

*Proof.* By induction over the size of t. Induction Base (t = e, where e ∈ Σ): We choose t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> arbitrarily. We distinguish by structural induction the following cases over the formula ϕ:


The structural induction hypothesis states that ∀t <sup>∈</sup> <sup>Σ</sup>+. t -LTL*fin* ψ| π <sup>t</sup> ⇔ enc(t ) constr(ψ, t) (SIH1), where ψ is a strict subformula of ϕ.


Induction Step (t = e · t <sup>∗</sup>, where e ∈ Σ and t <sup>∗</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>): The induction hypothesis states that ∀t <sup>∈</sup> <sup>Σ</sup>+. t -LTL*fin* ϕ| π <sup>t</sup><sup>∗</sup> ⇔ enc(t ) constr(ϕ, t∗) (IH). We make use of structural induction over ϕ. All base cases are covered as their proofs above are independent of |t|. The structural induction hypothesis states for all strict subformulas ψ that ∀t <sup>∈</sup> <sup>Σ</sup>+. t -LTL*fin* ψ| π <sup>t</sup> ⇔ enc(t ) constr(ψ, t).

$$\begin{array}{lcl} & \varphi \vee \psi \quad \forall t \quad \varphi \quad \forall t \quad \forall \text{Tr}\_{\text{LTL}\_{\text{fin}}} \left( \varphi \vee \psi \right) \vert \tau\_{t} \Leftrightarrow t' \models\_{\text{LTL}\_{\text{fin}}} \varphi \vert \tau\_{t} \vee t' \models\_{\text{LTL}\_{\text{fin}}} \psi \vert \tau'\_{t} \\ & \xleftrightarrow0 \; \forall t \; \forall t \; \forall \text{const} \left( \varphi,t \right) \vee \upsilon en(t') \models constr(\psi,t) \\ & \xleftrightarrow0 \; \forall t \; \forall \left( \varphi \middle| \tau,e,0 \right) \wedge \bigwedge\_{v\_{\varphi^{1,1}},t \; \varphi \in [\pi,e,0]} v\_{\varphi^{1,1}} \to constr(v\_{\varphi^{1,1}},t^{\*}) \mid \\ & \lor enc(t') \models \left( \varphi \left[ \pi,e,0 \right] \wedge \bigwedge\_{v\_{\psi^{1,1}} \in [\pi,e,0]} v\_{\psi^{1,1}} \to constr(v\_{\psi^{1,1}},t^{\*}) \right) \\ & \xleftrightarrow0 \; \forall \left( t' \middle| \tau \; \forall \varphi \left[ \pi,e,0 \right] \right) \vee \psi \left[ \pi,e,0 \right] \\ & \land \bigwedge\_{v\_{\psi^{1,1}} \in [\pi,e,0]} v\_{\psi^{1,1}} \to constr(v\_{\psi^{1,1}},t^{\*}) \\ & \land \bigwedge\_{v\_{\psi^{1,1}} \in [\varphi \sqcap \pi,e,0]} v\_{\psi^{1,1}} \to constr(v\_{\psi^{1,1}},t^{\*}) \\ & \leftrightarrow \quad enc(t') \models \left( \varphi \vee \psi \right) \left[ \pi,e,0 \right] \\ &$$

†: ⇐: trivial, ⇒: Assume a model M<sup>ϕ</sup> for enc(t ) ϕ[π, e, 0] ∧ A. By construction, constraints by ϕ do not share variable with constraints by ψ. We extend the model by assigning vψ-,<sup>1</sup> with ⊥, for all vψ-,<sup>1</sup> ∈ ψ[π, e, 0] and assigning the rest of the variables in ψ[π, e, 0] arbitrarily.

$$\begin{array}{cl}\mathsf{O}\downarrow\mathsf{O}\downarrow^{t} & \mathrel{l} \operatorname{T}\_{\operatorname{TLm}\_{\operatorname{in}}}\langle\mathsf{O}\downarrow\rangle\overline{\iota}\_{\operatorname{L}}^{t} \iff \ell^{t} \vdash\_{\operatorname{TLm}\_{\operatorname{in}}}\langle\mathsf{O}\downarrow\rangle\overline{\iota}\_{\operatorname{L}}^{t} \iff \ell^{t}[1] \vdash\_{\operatorname{TLm}\_{\operatorname{in}}}\varphi[\iota]\_{\operatorname{r}}^{\operatorname{L}} \\\\ & \mathit{enc}(\ell^{t}[1]) = \operatorname{constr}(\varphi,\uparrow) \stackrel{\operatorname{cond}}{\operatorname{}} \mathsf{c}(\ell^{t}) \models \mathsf{Constm}(\varphi\downarrow,t) \\\\ & \quad \ell^{t} \vdash\_{\operatorname{LTm}\_{\operatorname{in}}}\langle\varphi\downarrow\ell^{t}\rangle\overline{\iota}\_{\operatorname{L}}^{t} \iff \ell^{t}[\ \vdash\_{\operatorname{TLm}\_{\operatorname{in}}}\varphi[\iota]\_{\operatorname{L}}^{t} \ \land \ell^{t}[1] \models\_{\operatorname{TLm}\_{\operatorname{in}}}\langle\varphi\ell\ell^{t}\rangle\rangle\_{\operatorname{l}^{t}}^{\operatorname{L}} \\\\ & \quad \uparrow\operatorname{B}\downarrow\operatorname{in}\_{\operatorname{L}}\langle\iota^{\operatorname{L}}\rangle\operatorname{in}\operatorname{cond}(\iota^{t},\psi\downarrow) \\\\ & \qquad \vee\operatorname{vec}(\ell^{t}) \models \operatorname{constr}(\varphi,\downarrow) \land \operatorname{cand}(\ell^{t}) \models \operatorname{constr}(\upsilon\_{\omega}^{\operatorname{L}}\psi\_{\downarrow},\uparrow^{t}) \\\\ & \quad \uparrow\operatorname{vec}(\ell^{t}) \models \langle\psi\!\pi,\epsilon,0\rangle \land \bigwedge\_{\begin{subarray}}\mathsf{c}\nu\_{\varphi^{\operatorname{L}},\operatorname{1}}\in\mathsf{c}\big[\mathsf{c}\big[\mathsf{c}\big[\!\vdash$$

– ϕ ∧ ψ, ϕ, and ϕ R ψ are proven analogously.

**Corollary 2.** *For any* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ *in negation normal form over atomic propositions AP and any finite traces* t, t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *it holds that* <sup>t</sup> ∈ Lt(ϕ) ⇔ enc*AP*(t ) constr( ˆϕ, t)*.*

$$Proof. \ t' \in \mathcal{L}\_t(\varphi) \xhookrightarrow{\text{Thm1}} t' \models\_{\text{LTL}\_{\hat{\rho}n}} \hat{\varphi}|\_{t}^{\pi} \xleftarrow{\text{Lern2}} enc(t') \models constr(\hat{\varphi}, t).$$

**Lemma 4.** *For any* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ *in negation normal form over atomic propositions AP and any finite traces* t, t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *it holds that* enc*AP*(t ) constr(ϕ, t) ⇒ ∀<sup>t</sup> <sup>∈</sup> <sup>Σ</sup><sup>+</sup>. t <sup>≤</sup> <sup>t</sup> → enc*AP*(t ) constr(ϕ, t)*.*

*Proof.* We proof this via contradiction. We choose t, t as well as ϕ arbitrarily, but in a way such that enc(t ) constr(ϕ, t) holds. Assume that there exists a continuation of t , that we call t , for which enc(t ) constr(ϕ, t) holds. So there has to exist a model assigning truth values to the variables in constr(ϕ, t), such that the constraint system is consistent. From this model we extract all assigned truths values for positional variables for position |t | to |t | − 1. As t is a prefix of t , we can use these truth values to construct a valid model for enc(t ) constr(ϕ, t), which is a contradiction.

**Fig. 2.** Runtime comparison between RVHyper and our constraint-based monitor on a non-interference specification with traces of varying input size.

**Corollary 3.** *For any* <sup>∀</sup><sup>2</sup>*HyperLTL formula* <sup>∀</sup>π, π . ϕ *in negation normal form over atomic propositions AP and any finite set of finite traces* <sup>T</sup> ∈ P(Σ<sup>+</sup>) *and finite trace* t <sup>∈</sup> <sup>Σ</sup><sup>+</sup> *it holds that*

$$t' \in \bigcap\_{t \in T} \mathcal{L}\_t(\varphi) \quad \Longleftrightarrow \quad enc\_{AP}(t') \models \bigwedge\_{t \in T} constr(\hat{\varphi}, t).$$

*Proof.* It holds that <sup>∀</sup>t, t <sup>∈</sup> <sup>Σ</sup>+. t <sup>=</sup> <sup>t</sup> → constr(ϕ, t) = constr(ϕ, t ). Follows with same reasoning as in earlier proofs combined with Corollary 2.

#### **5 Experimental Evaluation**

We implemented two versions of the algorithm presented in this paper. The first implementation encodes the constraint system as a Boolean satisfiability problem (SAT), whereas the second one represents it as a (reduced ordered) binary decision diagram (BDD). The formula rewriting is implemented in a Maude [8] script. The constraint system is solved by either CryptoMiniSat [23] or CUDD [22]. All benchmarks were executed on an Intel Core i5-6200U CPU @2.30 GHz with 8 GB of RAM. The set of benchmarks chosen for our evaluation is composed out of two benchmarks presented in earlier publications [12,13] plus instances of *guarded invariants* at which our implementations excels.

**Non-interference.** Non-interference [16,19] is an important information flow policy demanding that an observer of a system cannot infer any high security input of a system by observing only low security input and output. Reformulated we could also say that all low security outputs *o*low have to be equal on all system executions as long as the low security inputs *i*low of those executions are the same: ∀π, π .(*o*low <sup>π</sup> <sup>↔</sup> *<sup>o</sup>*low π- ) <sup>W</sup>(*i*low <sup>π</sup> *i*low π- ). This class of benchmarks was used to evaluated RVHyper [13], an automata-based runtime verification tool

**Fig. 3.** Runtime comparison between RVHyper and our constraint-based monitor on the guarded invariant benchmark with trace lengths 20, 20 bit input size.

**Table 1.** Average results of our implementation compared to RVHyper on traces generated from circuit instances. Every instance was run 10 times.


for HyperLTL formulas. We repeated the experiments and depict the results in Fig. 2. We choose a trace length of 50 and monitored non-interference on 1000 randomly generated traces, where we distinguish between a 64 bit input (left) and an 128 bit input (right). For 64 bit input, our BDD implementation performs comparably well to RVHyper, which statically constructs a monitor automaton. For 128 bit input, RVHyper was not able to construct the automaton in reasonable time. Our implementation, however, shows promising results for this benchmark class that puts the automata-based construction to its limit.

**Detecting Spurious Dependencies in Hardware Designs.** The problem whether input signals influence output signals in hardware designs, was considered in [13]. Formally, we specify this property as the following HyperLTL formula: <sup>∀</sup>π1∀π2.(*o*<sup>π</sup><sup>1</sup> <sup>↔</sup> *<sup>o</sup>*<sup>π</sup><sup>2</sup> ) <sup>W</sup>(*i*<sup>π</sup><sup>1</sup> *i*<sup>π</sup><sup>2</sup> ), where *i* denotes all inputs except *i*. Intuitively, the formula asserts that for every two pairs of execution traces (π1, π2) the value of *o* has to be the same until there is a difference between π<sup>1</sup> and π<sup>2</sup> in the input vector *i*, i.e., the inputs on which *o* may depend. We consider the same hardware and specifications as in [13]. The results are depicted in Table 1. Again, the BDD implementation handles this set of benchmarks well.

**Fig. 4.** Runtime of the SAT-based algorithm on the guarded invariant benchmark with a varying number of atomic propositions.

The biggest difference can be seen between the runtimes for counter2. This is explained by the fact that this benchmark demands the highest number of observed traces, and therefore the impact of the quadratic runtime costs in the number of traces dominates the result. We can, in fact, clearly observe this correlation between the number of traces and the runtime on RVHyper's performance over all benchmarks. On the other hand our constraint-based implementations do not show this behavior.

**Guarded Invariants.** We consider a new class of benchmarks, called *guarded invariants*, which express a certain invariant relation between two

traces, which are, additionally, guarded by a precondition. Figure 3 shows the results of monitoring an arbitrary invariant <sup>P</sup> : <sup>Σ</sup> <sup>→</sup> <sup>B</sup> of the following form: ∀π, π . (∨<sup>i</sup>∈<sup>I</sup> <sup>i</sup><sup>π</sup> i<sup>π</sup>- ) → (P(π) ↔ P(π )). Our approach significantly outperforms RVHyper on this benchmark class, as the conjunct splitting optimization, described in Sect. 4.1, synergizes well with SAT-solver implementations.

**Atomic Proposition Scalability.** While RVHyper is inherently limited in its scalability concerning formula size as the construction of the deterministic monitor automaton gets increasingly hard, the rewrite-based solution is not affected by this limitation. To put it to the test we have ran the SAT-based implementation on guarded invariant formulas with up to 100 different atomic propositions. Formulas have the form: ∀π, π .(∧<sup>n</sup>in <sup>i</sup>=1(ini,π ↔ ini,π- )) <sup>→</sup> (∨<sup>n</sup>out <sup>j</sup>=1 (outj,π ↔ outj,π- )), where nin, nout represents the number of input and output atomic propositions, respectively. Results can be seen in Fig. 4. Note that RVHyper already fails to build monitor automata for |nin + nout| > 10.

### **6 Conclusion**

We pursued the success story of rewrite-based monitors for trace properties by applying the technique to the runtime verification problem of Hyperproperties. We presented an algorithm that, given a <sup>∀</sup><sup>2</sup>HyperLTL formula, incrementally constructs constraints that represent requirements on future traces, instead of storing traces during runtime. Our evaluation shows that our approach scales in parameters where existing automata-based approaches reach their limits.

**Acknowledgments.** We thank Bernd Finkbeiner for his valuable feedback on earlier versions of this paper.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Hybrid and Stochastic Systems

# **Tail Probabilities for Randomized Program Runtimes via Martingales for Higher Moments**

Satoshi Kura1,2(B), Natsuki Urabe1,2, and Ichiro Hasuo2,3

<sup>1</sup> Department of Computer Science, University of Tokyo, Tokyo, Japan kurasatoshi@is.s.u-tokyo.ac.jp <sup>2</sup> National Institute of Informatics, Tokyo, Japan <sup>3</sup> The Graduate University for Advanced Studies (SOKENDAI), Kanagawa, Japan

**Abstract.** Programs with randomization constructs is an active research topic, especially after the recent introduction of martingalebased analysis methods for their termination and runtimes. Unlike most of the existing works that focus on proving almost-sure termination or estimating the expected runtime, in this work we study the *tail probabilities* of runtimes—such as "the execution takes more than 100 steps with probability at most 1%." To this goal, we devise a theory of supermartingales that overapproximate *higher moments* of runtime. These higher moments, combined with a suitable concentration inequality, yield useful upper bounds of tail probabilities. Moreover, our vector-valued formulation enables automated template-based synthesis of those supermartingales. Our experiments suggest the method's practical use.

### **1 Introduction**

The important roles of *randomization* in algorithms and software systems are nowadays well-recognized. In algorithms, randomization can bring remarkable speed gain at the expense of small probabilities of imprecision. In cryptography, many encryption algorithms are randomized in order to conceal the identity of plaintexts. In software systems, randomization is widely utilized for the purpose of fairness, security and privacy.

Embracing randomization in programming languages has therefore been an active research topic for a long time. Doing so does not only offer a solid infrastructure that programmers and system designers can rely on, but also opens up the possibility of *language-based, static* analysis of properties of randomized algorithms and systems.

The current paper's goal is to analyze imperative programs with randomization constructs—the latter come in two forms, namely probabilistic branching and assignment from a designated, possibly continuous, distribution. We shall refer to such programs as *randomized programs*. 1

**Runtime and Termination Analysis of Randomized Programs.** The *runtime* of a randomized program is often a problem of our interest; so is *almost-sure termination*, that is, whether the program terminates with probability 1. In the programming language community, these problems have been taken up by many researchers as a challenge of both practical importance and theoretical interest.

Most of the existing works on runtime and termination analysis follow either of the following two approaches.


The essential difference between the two approaches is not big: an invariant notion in the latter is easily seen to be an adaptation of a suitable notion of supermartingale. The work [33] presents a comprehensive account on the ordertheoretic foundation behind these techniques.

These existing works are mostly focused on the following problems: deciding almost-sure termination, computing termination probabilities, and computing expected runtime. (Here "computing" includes giving upper/lower bounds.) See [33] for a comparison of some of the existing martingale-based methods.

**Our Problem: Tail Probabilities for Runtimes.** In this paper we focus on the problem of *tail probabilities* that is not studied much so far.<sup>2</sup> We present a method for *overapproximating* tail probabilities; here is the problem we solve.

**Input:** a randomized program <sup>Γ</sup>, and a *deadline* <sup>d</sup> <sup>∈</sup> <sup>N</sup>

**Output:** an upper bound of the *tail probability* Pr(Trun <sup>≥</sup> <sup>d</sup>), where <sup>T</sup>run is the runtime of Γ

Our target language is a imperative language that features randomization (probabilistic branching and random assignment). We also allow nondeterminism; this makes the program's runtime depend on the choice of a *scheduler* (i.e. how nondeterminism is resolved). In this paper we study the longest, worst-case runtime (therefore our scheduler is *demonic*). In the technical sections, we use the presentation of these programs as *probabilistic control graphs (pCFGs)*—this is as usual in the literature. See e.g. [1,33].

<sup>1</sup> With the rise of statistical machine learning, *probabilistic programs* attract a lot of attention. Randomized programs can be thought of as a fragment of probabilistic programs without *conditioning* (or *observation*) constructs. In other words, the Bayesian aspect of probabilistic programs is absent in randomized programs.

<sup>2</sup> An exception is [5]; see Sect. 7 for comparison with the current work.

An example of our target program is in Fig. 1. It is an imperative program with randomization: in Line 3, the value of z is sampled from the uniform distribution over the interval [−2, 1]. The symbol ∗ in the line 4 stands for a nondeterministic Boolean value; in our analysis, it is resolved so that the runtime becomes the longest.

```
1 x := 2; y := 2;
2 while (x > 0 && y > 0) do
3 z := Unif (-2,1);
4 if * then
5 x := x + z
6 else
7 y := y + z
8 fi
9 od
```
Given the program in Fig. 1 and a choice of a deadline (say d = 400), we can ask the question

**Fig. 1.** An example program

"what is the probability Pr(Trun <sup>≥</sup> <sup>d</sup>) for the runtime <sup>T</sup>run of the program to exceed d = 400 steps?" As we show in Sect. 6, our method gives a guaranteed upper bound 0.0684. This means that, if we allow the time budget of d = 400 steps, the program terminates with the probability at least 93%.


**Fig. 2.** Our workflow

**Our Method: Concentration Inequalities, Higher Moments, and Vector-Valued Supermartingales.** Towards the goal of computing tail probabilities, our approach is to use *concentration inequalities*, a technique from probability theory that is commonly used for overapproximating various tail probabilities. There are various concentration inequalities in the literature, and each of them is applicable in a different setting, such as a nonnegative random variable (Markov's inequality), known mean and variance (Chebyshev's inequality), a difference-bounded martingale (Azuma's inequality), and so on. Some of them were used for analyzing randomized programs [5] (see Sect. 7 for comparison).

In this paper, we use a specific concentration inequality that uses *higher moments* E[Trun],...,E[(Trun)<sup>K</sup>] of runtimes Trun, up to a choice of the maximum degree K. The concentration inequality is taken from [3]; it generalizes Markov's and Chebyshev's. We observe that a higher moment yields a tighter bound of the tail probability, as the deadline d grows bigger. Therefore it makes sense to strive for computing higher moments.

For computing higher moments of runtimes, we systematically extend the existing theory of ranking supermartingales, from the expected runtime (i.e. the first moment) to higher moments. The theory features a *vector-valued* supermartingale, which not only generalizes easily to degrees up to arbitrary <sup>K</sup> <sup>∈</sup> <sup>N</sup>, but also allows automated synthesis much like usual supermartingales.

We also claim that the soundness of these vector-valued supermartingales is proved in a mathematically clean manner. Following our previous work [33], our arguments are based on the order-theoretic foundation of fixed points (namely the Knaster-Tarski, Cousot–Cousot and Kleene theorems), and we give upper bounds of higher moments by suitable least fixed points.

Overall, our workflow is as shown in Fig. 2. We note that the step 2 in Fig. 2 is computationally much cheaper than the step 1: in fact, the step 2 yields a symbolic expression for an upper bound in which d is a free variable. This makes it possible to draw graphs like the ones in Fig. 3. It is also easy to find a deadline <sup>d</sup> for which Pr(Trun <sup>≥</sup> <sup>d</sup>) is below a given threshold <sup>p</sup> <sup>∈</sup> [0, 1].

We implemented a prototype that synthesizes vector-valued supermartingales using linear and polynomial templates. The resulting constraints are solved by LP and SDP solvers, respectively. Experiments show that our method can produce nontrivial upper bounds in reasonable computation time. We also experimentally confirm that higher moments are useful in producing tighter bounds.

**Our Contributions.** Summarizing, the contribution of this paper is as follows.


**Organization.** We give preliminaries in Sect. 2. In Sect. 3, we review the ordertheoretic characterization of ordinary ranking supermartingales and present an extension to higher moments of runtimes. In Sect. 4, we discuss how to obtain an upper bound of the tail probability of runtimes. In Sect. 5, we explain an automated synthesis algorithm for our ranking supermartingales. In Sect. 6, we give experimental results. In Sect. 7, we discuss related work. We conclude and give future work in Sect. 8. Some proofs and details are deferred to the appendices available in the extended version [22].

### **2 Preliminaries**

We present some preliminary materials, including the definition of pCFGs (we use them as a model of randomized programs) and the definition of runtime.

Given topological spaces <sup>X</sup> and <sup>Y</sup> , let <sup>B</sup>(X) be the set of Borel sets on <sup>X</sup> and <sup>B</sup>(X, Y ) be the set of Borel measurable functions <sup>X</sup> <sup>→</sup> <sup>Y</sup> . We assume that the set <sup>R</sup> of reals, a finite set <sup>L</sup> and the set [0,∞] are equipped with the usual topology, the discrete topology, and the order topology, respectively. We use the induced Borel structures for these spaces. Given a measurable space <sup>X</sup>, let <sup>D</sup>(X) be the set of probability measures on <sup>X</sup>. For any <sup>μ</sup> ∈ D(X), let supp(μ) be the support of μ. We write E[X] for the expectation of a random variable X.

Our use of pCFGs follows recent works including [1].

**Definition 2.1 (pCFG).** A *probabilistic control flow graph (pCFG)* is a tuple <sup>Γ</sup> = (L, V, linit, *<sup>x</sup>*init, →, Up,Pr, G) that consists of the following.


The update function can be decomposed into three functions Up<sup>D</sup> : <sup>L</sup>AD <sup>→</sup> <sup>V</sup> × B(R<sup>V</sup> , <sup>R</sup>), Up<sup>P</sup> : <sup>L</sup>AP <sup>→</sup> <sup>V</sup> × D(R) and Up<sup>N</sup> : <sup>L</sup>AN <sup>→</sup> <sup>V</sup> × B(R), under a suitable decomposition <sup>L</sup><sup>A</sup> <sup>=</sup> <sup>L</sup>AD <sup>∪</sup> <sup>L</sup>AP <sup>∪</sup> <sup>L</sup>AN of assignment locations. The elements of LAD, LAP and LAN represent *deterministic*, *probabilistic* and *nondeterministic* assignments, respectively. See e.g. [33].

An example of a pCFG is shown on the right. It models the program in Fig. 1. The node l<sup>4</sup> is a nondeterministic location. Unif(−2, 1) is the uniform distribution on the interval [−2, 1].

<sup>A</sup> *configuration* of a pCFG <sup>Γ</sup> is a pair (l, *<sup>x</sup>*) <sup>∈</sup> <sup>L</sup> <sup>×</sup> <sup>R</sup><sup>V</sup> of a location and a valuation. We regard the set <sup>S</sup> <sup>=</sup> <sup>L</sup> <sup>×</sup> <sup>R</sup><sup>V</sup> of configurations is equipped with the product topology where L is equipped with the discrete topology. We say a configuration (l , *x* ) is a *successor* of (l, *<sup>x</sup>*), if <sup>l</sup> → <sup>l</sup> and the following hold.


An *invariant* of a pCFG <sup>Γ</sup> is a measurable set <sup>I</sup> ∈ B(S) such that (linit, *<sup>x</sup>*init) <sup>∈</sup> <sup>I</sup> and <sup>I</sup> is closed under taking successors (i.e. if <sup>c</sup> <sup>∈</sup> <sup>I</sup> and <sup>c</sup> is a successor of <sup>c</sup> then <sup>c</sup> <sup>∈</sup> <sup>I</sup>). Use of invariants is a common technique in automated synthesis of supermartingales [1]: it restricts configuration spaces and thus makes the constraints on supermartingales weaker. It is also common to take an invariant as a measurable set [1]. A *run* of Γ is an infinite sequence of configurations c0c<sup>1</sup> ... such that c<sup>0</sup> is the initial configuration (linit, *x*init) and ci+1 is a successor of c<sup>i</sup> for each i. Let Run(Γ) be the set of runs of Γ.

<sup>A</sup> *scheduler* resolves nondeterminism: at a location in <sup>L</sup><sup>N</sup> <sup>∪</sup> <sup>L</sup>AN , it chooses a distribution of next configurations depending on the history of configurations visited so far. Given a pCFG Γ and a scheduler σ of Γ, a probability measure νΓ <sup>σ</sup> on Run(Γ) is defined in the usual manner. See [22, Appendix B] for details.

**Definition 2.2 (reaching time** T <sup>Γ</sup> <sup>C</sup> , T <sup>Γ</sup> C,σ**).** Let <sup>Γ</sup> be a pCFG and <sup>C</sup> <sup>⊆</sup> <sup>S</sup> be a set of configurations called a *destination*. The *reaching time* to C is a function T <sup>Γ</sup> <sup>C</sup> : Run(Γ) <sup>→</sup> [0,∞] defined by (<sup>T</sup> <sup>Γ</sup> <sup>C</sup> )(c0c<sup>1</sup> ...) = argmin<sup>i</sup>∈<sup>N</sup>(c<sup>i</sup> <sup>∈</sup> <sup>C</sup>). Fixing a scheduler σ makes T <sup>Γ</sup> <sup>C</sup> a random variable, since σ determines a probability measure ν<sup>Γ</sup> <sup>σ</sup> on Run(Γ). It is denoted by T <sup>Γ</sup> C,σ.

Runtimes of pCFGs are a special case of reaching times, namely to the set of terminating configurations.

The following higher moments are central to our framework. Recall that we are interested in demonic schedulers, i.e. those which make runtimes longer.

**Definition 2.3 (**MΓ,k C,σ **and** <sup>M</sup>Γ,k <sup>C</sup> **).** Assume the setting of Definition 2.2, and let <sup>k</sup> <sup>∈</sup> <sup>N</sup> and <sup>c</sup> <sup>∈</sup> <sup>S</sup>. We write <sup>M</sup>Γ,k C,σ(c) for the k-th moment of the reaching time of Γ from c to C under the scheduler σ, i.e. that is, MΓ,k C,σ(c) = <sup>E</sup>[(<sup>T</sup> <sup>Γ</sup>*<sup>c</sup>* C,σ)<sup>k</sup>] = (T <sup>Γ</sup>*<sup>c</sup>* <sup>C</sup> )<sup>k</sup> dν<sup>Γ</sup>*<sup>c</sup>* <sup>σ</sup> where Γ<sup>c</sup> is a pCFG obtained from Γ by changing the initial configuration to <sup>c</sup>. Their supremum under varying <sup>σ</sup> is denoted by <sup>M</sup>Γ,k <sup>C</sup> := sup<sup>σ</sup> MΓ,k C,σ.

### **3 Ranking Supermartingale for Higher Moments**

We introduce one of the main contributions in the paper, a notion of ranking supermartingale that overapproximates higher moments. It is motivated by the following observation: martingale-based reasoning about the second moment must concur with one about the first moment. We conduct a systematic theoretical extension that features an order-theoretic foundation and vector-valued supermartingales. The theory accommodates nondeterminism and continuous distributions, too. We omit some details and proofs; they are in [22, Appendix C].

The fully general theory for higher moments will be presented in Sect. 3.2; we present its restriction to the second moments in Sect. 3.1 for readability.

Prior to these, we review the existing theory of ranking supermartingales, through the lens of order-theoretic fixed points. In doing so we follow [33].

**Definition 3.1 ("nexttime" operation** X **(pre-expectation)).** Given η : <sup>S</sup> <sup>→</sup> [0,∞], let <sup>X</sup><sup>η</sup> : <sup>S</sup> <sup>→</sup> [0,∞] be the function defined as follows.


Intuitively, Xη is the expectation of η after one transition. Nondeterminism is resolved by the maximal choice.

We define <sup>F</sup><sup>1</sup> : (<sup>S</sup> <sup>→</sup> [0,∞]) <sup>→</sup> (<sup>S</sup> <sup>→</sup> [0,∞]) as follows.

$$(F\_1(\eta))(c) = \begin{cases} 1 + (\overline{\mathbb{X}}\eta)(c) & c \in I \backslash C \\ 0 & \text{otherwise} \end{cases} \text{ (Here } \text{``1+'' accounts for time elapse)}.$$

The function F<sup>1</sup> is an adaptation of the *Bellman operator*, a classic notion in the theory of Markov processes. A similar notion is used e.g. in [19]. The function space (<sup>S</sup> <sup>→</sup> [0,∞]) is a complete lattice structure, because [0,∞] is; moreover F<sup>1</sup> is easily seen to be monotone. It is not hard to see either that the expected reaching time <sup>M</sup>Γ,<sup>1</sup> <sup>C</sup> to C coincides with the least fixed point μF1.

The following theorem is fundamental in theoretical computer science.

**Theorem 3.2 (Knaster–Tarski,** [34]**).** *Let* (L, <sup>≤</sup>) *be a complete lattice and* <sup>f</sup> : <sup>L</sup> <sup>→</sup> <sup>L</sup> *be a monotone function. The least fixed point* μf *is the least prefixed point, i.e.* μf = min{<sup>l</sup> <sup>∈</sup> <sup>L</sup> <sup>|</sup> <sup>f</sup>(l) <sup>≤</sup> <sup>l</sup>}*.*

The significance of the Knaster-Tarski theorem in verification lies in the induced proof rule: <sup>f</sup>(l) <sup>≤</sup> <sup>l</sup> <sup>⇒</sup> μf <sup>≤</sup> <sup>l</sup>. Instantiating to the expected reaching time <sup>M</sup>Γ,<sup>1</sup> <sup>C</sup> = μF1, it means <sup>F</sup>1(η)≤<sup>η</sup> <sup>⇒</sup> <sup>M</sup>Γ,<sup>1</sup> <sup>C</sup> <sup>≤</sup>η, i.e. an arbitrary prefixed point of <sup>F</sup>1—which coincides with the notion of ranking supermartingale [4]—overapproximates the expected reaching time. This proves soundness of ranking supermartingales.

#### **3.1 Ranking Supermartingales for the Second Moments**

We extend ranking supermartingales to the second moments. It paves the way to a fully general theory (up to the K-th moments) in Sect. 3.2.

The key in the martingale-based reasoning of expected reaching times (i.e. first moments) was that they are characterized as the least fixed point of a function F1. Here it is crucial that for an arbitrary random variable T, we have E[T + 1] = E[T]+1 and therefore we can calculate E[T + 1] from E[T]. However, this is not the case for second moments. As E[(T + 1)<sup>2</sup>] = E[T<sup>2</sup>]+2E[T] + 1, calculating the second moment requires not only E[T<sup>2</sup>] but also E[T]. This encourages us to define a vector-valued supermartingale.

**Definition 3.3 (time-elapse function** El1**).** A function El<sup>1</sup> :[0,∞] <sup>2</sup> <sup>→</sup> [0,∞] 2 is defined by El1(x1, x2)=(x<sup>1</sup> + 1, x<sup>2</sup> + 2x<sup>1</sup> + 1).

Then, an extension of F<sup>1</sup> for second moments can be defined as a combination of the time-elapse function El<sup>1</sup> and the pre-expectation X.

**Definition 3.4 (**F2**).** Let <sup>I</sup> be an invariant and <sup>C</sup> <sup>⊆</sup> <sup>I</sup> be a Borel set. We define <sup>F</sup><sup>2</sup> : (<sup>S</sup> <sup>→</sup> [0,∞] <sup>2</sup>) <sup>→</sup> (<sup>S</sup> <sup>→</sup> [0,∞] <sup>2</sup>) by

$$(F\_2(\eta))(c) = \begin{cases} (\overline{\mathbb{X}}(\text{El}\_1 \circ \eta))(c) & c \in I \backslash C\\ (0,0) & \text{otherwise.} \end{cases}$$

Here X is applied componentwise: (X(η1, η2))(c) = ((Xη1)(c),(Xη2)(c)).

We can extend the complete lattice structure of [0,∞] to the function space <sup>S</sup> <sup>→</sup> [0,∞] <sup>2</sup> in a pointwise manner. It is a routine to prove that F<sup>2</sup> is monotone with respect to this complete lattice structure. Hence F<sup>2</sup> has the least fixed point. In fact, while <sup>M</sup>Γ,<sup>1</sup> <sup>C</sup> was characterized as the least fixed point of F1, a tuple (MΓ,<sup>1</sup> <sup>C</sup> , <sup>M</sup>Γ,<sup>2</sup> <sup>C</sup> ) is *not* the least fixed point of F<sup>2</sup> (cf. Example 3.8 and Theorem 3.9). However, the least fixed point of F<sup>2</sup> *overapproximates* the tuple of moments.

**Theorem 3.5.** *For any configuration* <sup>c</sup> <sup>∈</sup> <sup>I</sup>*,* (μF2)(c) <sup>≥</sup> (MΓ,<sup>1</sup> <sup>C</sup> (c), <sup>M</sup>Γ,<sup>2</sup> <sup>C</sup> (c))*.*

$$\begin{aligned} \text{Let } T\_{C,\sigma,n}^{\Gamma} &= \min\{n, T\_{C,\sigma}^{\Gamma}\}. \text{ To prove the above theorem, we inductively prove} \\ &(F\_2)^n(\bot)(c) \ge \left(\int T\_{C,\sigma,n}^{\Gamma\_c} \, \mathrm{d}\nu\_{\sigma}^{\Gamma\_c}, \int (T\_{C,\sigma,n}^{\Gamma\_c})^2 \, \mathrm{d}\nu\_{\sigma}^{\Gamma\_c}\right) \end{aligned}$$

for each σ and n, and take the supremum. See [22, Appendix C] for more details.

Like ranking supermartingale for first moments, ranking supermartingale for second moments is defined as a prefixed point of F2, i.e. a function η such that <sup>η</sup> <sup>≥</sup> <sup>F</sup>2(η). However, we modify the definition for the sake of implementation.

**Definition 3.6 (ranking supermartingale for second moments).** A ranking supermartingale for second moments is a function <sup>η</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup><sup>2</sup> such that: (i) <sup>η</sup>(c) <sup>≥</sup> (X(El<sup>1</sup> ◦ <sup>η</sup>))(c) for each <sup>c</sup> <sup>∈</sup> <sup>I</sup> \ <sup>C</sup>; and (ii) <sup>η</sup>(c) <sup>≥</sup> 0 for each <sup>c</sup> <sup>∈</sup> <sup>I</sup>.

Here, the time-elapse function El<sup>1</sup> captures a positive decrease of the ranking supermartingale. Even though we only have inequality in Theorem 3.5, we can prove the following desired property of our supermartingale notion.

**Theorem 3.7.** *If* <sup>η</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup><sup>2</sup> *is a supermartingale for second moments, then* - MΓ,<sup>1</sup> <sup>C</sup> (c), <sup>M</sup>Γ,<sup>2</sup> <sup>C</sup> (c) <sup>≤</sup> <sup>η</sup>(c) *for each* <sup>c</sup> <sup>∈</sup> <sup>I</sup>*.*

The following example and theorem show that we cannot replace ≥ with = in Theorem 3.5 in general, but it is possible in the absence of nondeterminism.

**Example 3.8.** The figure on the right shows a pCFG such that <sup>l</sup><sup>2</sup> <sup>∈</sup> <sup>L</sup><sup>P</sup> and all the other locations are in L<sup>N</sup> , the initial location is l<sup>0</sup> and l<sup>12</sup> is a terminating location. For the pCFG, the left-hand side of the inequality in

Theorem 3.5 is μF2(l0) = (6, 37.5). In contrast, if a scheduler σ takes a transition from <sup>l</sup><sup>1</sup> to <sup>l</sup><sup>2</sup> with probability <sup>p</sup>, (MΓ,<sup>1</sup> C,σ(l0), <sup>M</sup>Γ,<sup>2</sup> C,σ(l0)) = - <sup>6</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup> p, <sup>36</sup> <sup>−</sup> <sup>5</sup> 2 p . Hence the right-hand side is (MΓ,<sup>1</sup> <sup>C</sup> (l0), <sup>M</sup>Γ,<sup>2</sup> <sup>C</sup> (l0)) = (6, 36).

**Theorem 3.9.** *If* <sup>L</sup><sup>N</sup> <sup>=</sup> <sup>L</sup>AN <sup>=</sup> <sup>∅</sup>*,* <sup>∀</sup><sup>c</sup> <sup>∈</sup> I.(μF2)(c)=(MΓ,<sup>1</sup> <sup>C</sup> (c), <sup>M</sup>Γ,<sup>2</sup> <sup>C</sup> (c))*.*

#### **3.2 Ranking Supermartingales for the Higher Moments**

We extend the result in Sect. 3.1 to moments higher than second. Firstly, the time-elapse function El<sup>1</sup> is generalized as follows.

**Definition 3.10 (time-elapse function** ElK,k <sup>1</sup> **).** For <sup>K</sup> <sup>∈</sup><sup>N</sup> and <sup>k</sup>∈ {1,...,K}, a function ElK,k <sup>1</sup> : [0,∞] <sup>K</sup> <sup>→</sup> [0,∞] is defined by ElK,k <sup>1</sup> (x1,...,xK) = 1+ <sup>k</sup> <sup>j</sup>=1 k j x<sup>j</sup> . Here k j is the binomial coefficient.

Again, a monotone function F<sup>K</sup> is defined as a combination of the time-elapse function ElK,k <sup>1</sup> and the pre-expectation X.

**Definition 3.11 (**FK**).** Let <sup>I</sup> be an invariant and <sup>C</sup> <sup>⊆</sup> <sup>I</sup> be a Borel set. We define <sup>F</sup><sup>K</sup> : (<sup>S</sup> <sup>→</sup> [0,∞] <sup>K</sup>) <sup>→</sup> (<sup>S</sup> <sup>→</sup> [0,∞] <sup>K</sup>) by FK(η)(c)=(FK,1(η)(c),..., <sup>F</sup>K,K(η)(c)), where <sup>F</sup>K,k : (<sup>S</sup> <sup>→</sup> [0,∞] <sup>K</sup>) <sup>→</sup> (<sup>S</sup> <sup>→</sup> [0,∞]) is given by

$$(F\_{K,k}(\eta))(c) = \begin{cases} (\overline{\mathbb{X}}(\text{El}\_1^{K,k} \circ \eta))(c) & c \in I \backslash C\\ 0 & \text{otherwise.} \end{cases}$$

As in Definition 3.6, we define a supermartingale as a prefixed point of FK.

**Definition 3.12 (ranking supermartingale for** K**-th moments).** We define <sup>η</sup>1,...,η<sup>K</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup> by (η1(c),...,ηK(c)) = <sup>η</sup>(c). A *ranking supermartingale for* <sup>K</sup>*-th moments* is a function <sup>η</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup><sup>K</sup> such that for each <sup>k</sup>, (i) <sup>η</sup>k(c) <sup>≥</sup> (X(ElK,k <sup>1</sup> ◦ <sup>η</sup>k))(c) for each <sup>c</sup> <sup>∈</sup> <sup>I</sup> \ <sup>C</sup>; and (ii) <sup>η</sup>k(c) <sup>≥</sup> 0 for each <sup>c</sup> <sup>∈</sup> <sup>I</sup>.

For higher moments, we can prove an analogous result to Theorem 3.7.

**Theorem 3.13.** *If* η *is a supermartingale for* K*-th moments, then for each* <sup>c</sup> <sup>∈</sup> <sup>I</sup>*,* (MΓ,<sup>1</sup> <sup>C</sup> (c),..., <sup>M</sup>Γ,K <sup>C</sup> (c)) <sup>≤</sup> <sup>η</sup>(c)*.*

#### **4 From Moments to Tail Probabilities**

We discuss how to obtain upper bounds of tail probabilities of runtimes from upper bounds of higher moments of runtimes. Combined with the result in Sect. 3, it induces a martingale-based method for overapproximating tail probabilities.

We use a concentration inequality. There are many choices of concentration inequalities (see e.g. [3]), and we use a variant of Markov's inequality. We prove that the concentration inequality is not only sound but also complete in a sense.

Formally, our goal is to calculate is an upper bound of Pr(T <sup>Γ</sup> C,σ <sup>≥</sup> <sup>d</sup>) for a given deadline d > 0, under the assumption that we know upper bounds u1,...,u<sup>K</sup> of moments E[T <sup>Γ</sup> C,σ],...,E[(T <sup>Γ</sup> C,σ)<sup>K</sup>]. In other words, we want to overapproximate sup<sup>μ</sup> <sup>μ</sup>([d,∞]) where <sup>μ</sup> ranges over the set of probability measures on [0,∞] satisfying x dμ(x),..., x<sup>K</sup> dμ(x) <sup>≤</sup> (u1,...,uK).

To answer this problem, we use a generalized form of Markov's inequality.

**Proposition 4.1 (see e.g.** [3, §2.1]**).** *Let* <sup>X</sup> *be a real-valued random variable and* <sup>φ</sup> *be a nondecreasing and nonnegative function. For any* <sup>d</sup> <sup>∈</sup> <sup>R</sup> *with* <sup>φ</sup>(d) <sup>&</sup>gt; <sup>0</sup>*,*

$$\Pr(X \ge d) \le \frac{\mathbb{E}[\phi(X)]}{\phi(d)}.$$

By letting φ(x) = x<sup>k</sup> in Proposition 4.1, we obtain the following inequality. It gives an upper bound of the tail probability that is "tight."

**Proposition 4.2.** *Let* <sup>X</sup> *be a nonnegative random variable. Assume* <sup>E</sup>[X<sup>k</sup>] <sup>≤</sup> <sup>u</sup><sup>k</sup> *for each* <sup>k</sup> ∈ {0,...,K}*. Then, for any* d > <sup>0</sup>*,*

$$\Pr(X \ge d) \le \min\_{0 \le k \le K} \frac{\tilde{u}\_k}{d^k}. \tag{1}$$

*Moreover, this upper bound is tight: for any* d > 0*, there exists a probability measure such that the above equation holds.*

*Proof.* The former part is immediate from Proposition 4.1. For the latter part, consider <sup>μ</sup> <sup>=</sup> pδ<sup>d</sup> + (1 <sup>−</sup> <sup>p</sup>)δ<sup>0</sup> where <sup>δ</sup><sup>x</sup> is the Dirac measure at <sup>x</sup> and <sup>p</sup> is the value of the right-hand side of (1).

By combining Theorem 3.13 with Proposition 4.2, we obtain the following corollary. We can use it for overapproximating tail probabilities.

**Corollary 4.3.** *Let* <sup>η</sup> : <sup>S</sup> <sup>→</sup> <sup>R</sup><sup>K</sup> *be a ranking supermartingale for* <sup>K</sup>*-th moments. For each scheduler* σ *and a deadline* d > 0*,*

$$\Pr(T\_{C,\sigma}^{\Gamma} \ge d) \le \min\_{0 \le k \le K} \frac{\eta\_k(l\_{\text{init}}, x\_{\text{init}})}{d^k}.\tag{2}$$

*Here* <sup>η</sup>0,...,η<sup>K</sup> *are defined by* <sup>η</sup>0(c)=1 *and* <sup>η</sup>(c)=(η1(c),...,ηK(c))*.*

Note that if K = 1, Corollary 4.3 is essentially the same as [5, Thm 4]. Note also that for each K there exists d > 0 such that <sup>η</sup>*K*(linit,*x*init) <sup>d</sup>*<sup>K</sup>* = min<sup>0</sup>≤k≤<sup>K</sup> η*k*(linit,*x*init) <sup>d</sup>*<sup>k</sup>* . Hence higher moments become useful in overapproximating tail probabilities as d gets large. Later in Sect. 6, we demonstrate this fact experimentally.

#### **5 Template-Based Synthesis Algorithm**

We discuss an automated synthesis algorithm that calculates an upper bound for the k-th moment of the runtime of a pCFG using a supermartingale in Definitions 3.6 or 3.12. It takes a pCFG <sup>Γ</sup>, an invariant <sup>I</sup>, a set <sup>C</sup> <sup>⊆</sup> <sup>I</sup> of configurations, and a natural number K as input and outputs an upper bound of K-th moment.

Our algorithm is adapted from existing template-based algorithms for synthesizing a ranking supermartingale (for first moments) [4,6,7]. It fixes a linear or polynomial template with unknown coefficients for a supermartingale and using numerical methods like linear programming (LP) or semidefinite programming (SDP), calculate a valuation of the unknown coefficients so that the axioms of ranking supermartingale for K-th moments are satisfied.

We hereby briefly explain the algorithms. See [22, Appendix D] for details.

**Linear Template.** Our linear template-based algorithm is adapted from [4,7]. We should assume that Γ, I and C are all "linear" in the sense that expressions appearing in Γ are all linear and I and C are represented by linear inequalities. To deal with assignments from a distribution like x := Norm(0, 1), we also assume that expected values of distributions appearing in Γ are known.

The algorithm first fixes a template for a supermartingale: for each location l, it fixes a K-tuple -<sup>|</sup><sup>V</sup> <sup>|</sup> <sup>j</sup>=1 <sup>a</sup><sup>l</sup> j,1x<sup>j</sup> <sup>+</sup> <sup>b</sup><sup>l</sup> 1,...,<sup>|</sup><sup>V</sup> <sup>|</sup> <sup>j</sup> a<sup>l</sup> j,Kx<sup>j</sup> + b<sup>l</sup> K of linear formulas. Here each a<sup>l</sup> j,i and b<sup>l</sup> <sup>i</sup> are unknown variables called *parameters*. The algorithm next collects conditions on the parameters so that the tuples constitute a ranking supermartingale for K-th moments. It results in a conjunction of formulas of a form <sup>ϕ</sup><sup>1</sup> <sup>≥</sup> <sup>0</sup> ∧···∧ <sup>ϕ</sup><sup>m</sup> <sup>≥</sup> <sup>0</sup> <sup>⇒</sup> <sup>ψ</sup> <sup>≥</sup> 0. Here <sup>ϕ</sup>1,...,ϕ<sup>m</sup> are linear formulas without parameters and ψ is a linear formula where parameters linearly appear in the coefficients. By Farkas' lemma (see e.g. [29, Cor 7.1h]) we can turn such formulas into linear inequalities over parameters by adding new variables. Its feasibility is efficiently solvable with an LP solver. We naturally wish to minimize an upper bound of the K-th moment, i.e. the last component of η(linit, *x*init). We can minimize it by setting it to the objective function of the LP problem.

**Polynomial Template.** The polynomial template-based algorithm is based on [6]. This time, Γ, I and C can be "polynomial." To deal with assignments of distributions, we assume that the n-th moments of distributions in Γ are easily calculated for each <sup>n</sup> <sup>∈</sup> <sup>N</sup>. It is similar to the linear template-based one.

It first fixes a polynomial template for a supermartingale, i.e. it assigns each location l a K-tuple of polynomial expressions with unknown coefficients. Likewise the linear template-based algorithm, the algorithm reduces the axioms of supermartingale for higher moments to a conjunction of formulas of a form <sup>ϕ</sup><sup>1</sup> <sup>≥</sup> <sup>0</sup> ∧···∧ <sup>ϕ</sup><sup>m</sup> <sup>≥</sup> <sup>0</sup> <sup>⇒</sup> <sup>ψ</sup> <sup>≥</sup> 0. This time, each <sup>ϕ</sup><sup>i</sup> is a polynomial formula without parameters and ψ is a polynomial formula whose coefficients are *linear* formula over the parameters. In the polynomial case, a conjunction of such formula is reduced to an SDP problem using a theorem called Positivstellensatz (we used a variant called Schm¨udgen's Positivstellensatz [28]). We solve the resulting problem using an SDP solver setting η(linit, *x*init) as the objective function.

#### **6 Experiments**

We implemented two programs in OCaml to synthesize a supermartingale based on (a) a linear template and (b) a polynomial template. The programs translate a given randomized program to a pCFG and output an LP or SDP problem as described in Sect. 5. An invariant I and a terminal configuration C for the input program are specified manually. See e.g. [20] for automatic synthesis of an invariant. For linear templates, we have used GLPK (v4.65) [12] as an LP solver. For polynomial templates, we have used SOSTOOLS (v3.03) [31] (a sums of squares optimization tool that internally uses an SDP solver) on Matlab (R2018b). We used SDPT3 (v4.0) [30] as an SDP solver. The experiments were carried out on a Surface Pro 4 with an Intel Core i5-6300U (2.40 GHz) and 8 GB RAM. We tested our implementation for the following two programs and their variants, which were also used in the literature [7,19]. Their code is in [22, Appendix E].

*Coupon collector's problem.* A probabilistic model of collecting coupons enclosed in cereal boxes. There exist n types of coupons, and one repeatedly buy cereal boxes until all the types of coupons are collected. We consider two cases: (1-1) n = 2 and (1-2) n = 4. We tested the linear template program for them.

*Random walk.* We used three variants of 1-dimensional random walks: (2-1) integer-valued one, (2-2) real-valued one with assignments from continuous distributions, (2-3) with adversarial nondeterminism; and two variants of 2 dimensional random walks (2-4) and (2-5) with assignments from continuous distributions and adversarial nondeterminism. We tested both the linear and the polynomial template programs for these examples.

**Experimental results.** We measured execution times needed for Step 1 in Fig. 2. The results are in Table 1. Execution times are less than 0.2 s for linear template programs and several minutes for polynomial template programs. Upper bounds of tail probabilities obtained from Proposition 4.2 are in Fig. 3.

We can see that our method is applicable even with nondeterministic branching ((2-3), (2-4) and (2-5)) or assignments from continuous distributions ((2-2), (2-4) and (2-5)). We can use a linear template for bounding higher moments as long as there exists a supermartingale for higher moments representable by linear expressions ((1-1), (1-2) and (2-3)). In contrast, for (2-1), (2-2) and (2-4), only a polynomial template program found a supermartingale for second moments.

It is expectable that the polynomial template program gives a better bound than the linear one because a polynomial template is more expressive than a linear one. However, it did not hold for some test cases, probably because of numerical errors of the SDP solver. For example, (2-1) has a supermartingale for third moments that can be checked by a hand calculation, but the SDP solver returned "infeasible" in the polynomial template program. It appears that our program fails when large numbers are involved (e.g. the third moments of (2-1), (2-2) and (2-3)). We have also tested a variant of (2-1) where the initial position is multiplied by 10000. Then the SDP solver returned "infeasible" in the polynomial template program while the linear template program returns a nontrivial bound. Hence it seems that numerical errors are likely to occur to the polynomial template program when large numbers are involved.

Figure 3 shows that the bigger the deadline d is, the more useful higher moments become (cf. a remark just after Corollary 4.3). For example, in (1-2), an upper bound of Pr(T <sup>Γ</sup> C,σ ≥ 100) calculated from the upper bound of the first moment is 0.680 while that of the fifth moment is 0.105.

To show the merit of our method compared with sampling-based methods, we calculated a tail probability bound for a variant of (2-2) (shown in Fig. 4 on

**Fig. 3.** Upper bounds of the tail probabilities (except (2-5)). Each gray line is the value of *<sup>u</sup><sup>k</sup> <sup>d</sup><sup>k</sup>* where *u<sup>k</sup>* is the best upper bound in Table 1 of *k*-th moments and *d* is a deadline. Each black line is the minimum of gray lines, i.e. the upper bound by Proposition 4.2. The red lines in (1-1), (1-2) and (2-1) show the true tail probabilities calculated analytically. The red points in (2-2) show tail probabilities calculated by Monte Carlo sampling where the number of trials is 100000000. We did not calculate the true tail probabilities nor approximate them for (2-4) and (2-5) because these examples seem difficult to do so due to nondeterminism. (Color figure online)



```
1 x := 200000000;
2 while true do
3 if prob(0.7) then
4 z := Unif(0,1);
5 x := x - z
6 else
7 z := Unif(0,1);
8 x := x + z
9 fi;
10 refute (x < 0)
11 od
```

```
Fig. 4. A variant of (2-2).
```
p. 12) with a deadline d = 10<sup>11</sup>. Because of its very long expected runtime, a sampling-based method would not work for it. In contrast, the linear templatebased program gave an upper bound Pr(T <sup>Γ</sup> C,σ <sup>≥</sup> <sup>10</sup><sup>11</sup>) <sup>≤</sup> <sup>5000000025</sup>/10<sup>11</sup> <sup>≈</sup> <sup>0</sup>.<sup>05</sup> in almost the same execution time as (2-2) (< 0.02 s).

### **7 Related Work**

**Martingale-Based Analysis of Randomized Programs.** Martingale-based methods are widely studied for the termination analysis of randomized programs. One of the first is *ranking supermartingales*, introduced in [4] for proving almost sure termination. The theory of ranking supermartingales has since been extended actively: accommodating nondeterminism [1,6,7,11], syntaxoriented composition of supermartingales [11], proving properties beyond termination/reachability [13], and so on. Automated template-based synthesis of supermartingales by constraint solving has been pursued, too [1,4,6,7].

Other martingale-based methods that are fundamentally different from ranking supermartingales have been devised, too. They include: different notions of *repulsing supermartingales* for refuting termination (in [8,33]; also studied in control theory [32]); and *multiply-scaled submartingales* for underapproximating reachability probabilities [33,36]. See [33] for an overview.

In the literature on martingale-based methods, the one closest to this work is [5]. Among its contribution is the analysis of tail probabilities. It is done by either of the following combinations: (1) *difference-bounded* ranking supermartingales and the corresponding Azuma's concentration inequality; and (2) (not necessarily difference-bounded) ranking supermartingales and Markov's concentration inequality. When we compare these two methods with ours, the first method requires repeated martingale synthesis for different parameter values, which can pose a performance challenge. The second method corresponds to the restriction of our method to the first moment; recall that we showed the advantage of using higher moments, theoretically (Sect. 4) and experimentally (Sect. 6). See [22, Appendix F.1] for detailed discussions. Implementation is lacking in [5], too.

We use Markov's inequality to calculate an upper bound of Pr(Trun <sup>≥</sup> <sup>d</sup>) from a ranking supermartingale. In [7], Hoeffding's and Bernstein's inequalities are used for the same purpose. As the upper bounds obtained by these inequalities are exponentially decreasing with respect to d, they are asymptotically tighter than our bound obtained by Markov's inequality, assuming that we use the same ranking supermartingale. However, Hoeffding's and Bernstein's inequalities are applicable to limited classes of ranking supermartingales (so-called differencebounded and incremental ones, respectively). There exists a randomized program whose tail probability for runtimes is decreasing only polynomially (not exponentially, see [22, Appendix G]); this witnesses that there are cases where the methods in [7] do not apply but ours can.

The work [1] is also close to ours in that their supermartingales are vectorvalued. The difference is in the orders: in [1] they use the *lexicographic* order between vectors, and they aim to prove almost sure termination. In contrast, we use the *pointwise* order between vectors, for overapproximating higher moments.

**The Predicate-Transformer Approach to Runtime Analysis.** In the runtime/termination analysis of randomized programs, another principal line of work uses *predicate transformers* [2,17,19], following the precedent works on probabilistic predicate transformers such as [21,25]. In fact, from the mathematical point of view, the main construct for witnessing runtime/termination in those predicate transformer calculi (called *invariants*, see e.g. in [19]) is essentially the same thing as ranking supermartingales. Therefore the difference between the martingale-based and predicate-transformer approaches is mostly the matter of presentation—the predicate-transformer approach is more closely tied to program syntax and has a stronger deductive flavor. It also seems that there is less work on automated synthesis in the predicate-transformer approach.

In the predicate-transformer approach, the work [17] is the closest to ours, in that it studies *variance* of runtimes of randomized programs. The main differences are as follows: (1) computing tail probabilities is not pursued [17]; (2) their extension from expected runtimes to variance involves an additional variable τ , which poses a challenge in automated synthesis as well as in generalization to even higher moments; and (3) they do not pursue automated analysis. See Appendix F.2 of the extended version [22] for further details.

**Higher Moments of Runtimes.** Computing and using higher moments of runtimes of probabilistic systems—generalizing randomized programs—has been pursued before. In [9], computing moments of runtimes of *finite-state* Markov chains is reduced to a certain linear equation. In the study of randomized algorithms, the survey [10] collects a number of methods, among which are some tail probability bounds using higher moments. Unlike ours, none of these methods are language-based static ones. They do not allow automated analysis.

**Other Potential Approaches to Tail Probabilities.** We discuss potential approaches to estimating tail probabilities, other than the martingale-based one.

*Sampling* is widely employed for approximating behaviors of probabilistic systems; especially so in the field of probabilistic programming languages, since exact symbolic reasoning is hard in presence of conditioning. See e.g. [35]. We also used sampling to estimate tail probabilities in (2-2), Fig. 3. The main advantages of our current approach over sampling are threefold: (1) our upper bounds come with a mathematical guarantee, while the sampling bounds can always be erroneous; (2) it requires ingenuity to sample programs with nondeterminism; and (3) programs whose execution can take millions of years can still be analyzed by our method in a reasonable time, without executing them. The latter advantage is shared by static, language-based analysis methods in general; see e.g. [2].

Another potential method is probabilistic model checkers such as PRISM [23]. Their algorithms are usually only applicable to finite-state models, and thus not to randomized programs in general. Nevertheless, fixing a deadline d can make the reachable part <sup>S</sup>≤<sup>d</sup> of the configuration space <sup>S</sup> finite, opening up the possibility of use of model checkers. It is an open question how to do so precisely, and the following challenges are foreseen: (1) if the program contains continuous distributions, the reachable part <sup>S</sup>≤<sup>d</sup> becomes infinite; (2) even if <sup>S</sup>≤<sup>d</sup> is finite, one has to repeat (supposedly expensive) runs of a model checker for each choice of d. In contrast, in our method, an upper bound for the tail probability Pr(Trun <sup>≥</sup> <sup>d</sup>) is symbolically expressed as a function of <sup>d</sup> (Proposition 4.2). Therefore, estimating tail probabilities for varying d is computationally cheap.

### **8 Conclusions and Future Work**

We provided a technique to obtain an upper bound of the tail probability of runtimes given a randomized algorithm and a deadline. We first extended the ordinary ranking supermartingale notion using the order-theoretic characterization so that it can calculate upper bounds of higher moments of runtimes for randomized programs. Then by using a suitable concentration inequality, we introduced a method to calculate an upper bound of tail probabilities from upper bounds of higher moments. Our method is not only sound but also complete in a sense. Our method was obtained by combining our supermartingale and the concentration inequality. We also implemented an automated synthesis algorithm and demonstrated the applicability of our framework.

**Future Work.** Example 3.8 shows that our supermartingale is not complete: it sometimes fails to give a tight bound for higher moments. Studying and improving the incompleteness is one possible direction of future work. For example, the following questions would be interesting: Can bounds given by our supermartingale be arbitrarily bad? Can we remedy the completeness by restricting the type of nondeterminism? Can we define a complete supermartingale?

Making our current method compositional is another direction of future research. Use of continuations, as in [18], can be a technical solution.

We are also interested in improving the implementation. The polynomial template program failed to give an upper bound for higher moments because of numerical errors (see Sect. 6). We wish to remedy this situation. There exist several studies for using numerical solvers for verification without affected by numerical errors [14–16,26,27]. We might make use of these works for improvements.

**Acknowledgement.** We thank the anonymous referees for useful comments. The authors are supported by JST ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), the JSPS-INRIA Bilateral Joint Research Project "CRECOGI," and JSPS KAKENHI Grant No. 15KT0012 & 15K11984. Natsuki Urabe is supported by JSPS KAKENHI Grant No. 16J08157.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Computing the Expected Execution Time of Probabilistic Workflow Nets**

Philipp J. Meyer(B) , Javier Esparza , and Philip Offtermatt

Technical University of Munich, Munich, Germany *{*meyerphi,esparza,offtermp*}*@in.tum.de

**Abstract.** Free-Choice Workflow Petri nets, also known as Workflow Graphs, are a popular model in Business Process Modeling.

In this paper we introduce Timed Probabilistic Workflow Nets (TPWNs), and give them a Markov Decision Process (MDP) semantics. Since the time needed to execute two parallel tasks is the maximum of the times, and not their sum, the expected time cannot be directly computed using the theory of MDPs with rewards. In our first contribution, we overcome this obstacle with the help of "earliest-first" schedulers, and give a single exponential-time algorithm for computing the expected time.

In our second contribution, we show that computing the expected time is #P-hard, and so polynomial algorithms are very unlikely to exist. Further, #P-hardness holds even for workflows with a very simple structure in which all transitions times are 1 or 0, and all probabilities are 1 or 0*.*5.

Our third and final contribution is an experimental investigation of the runtime of our algorithm on a set of industrial benchmarks. Despite the negative theoretical results, the results are very encouraging. In particular, the expected time of every workflow in a popular benchmark suite with 642 workflow nets can be computed in milliseconds. Data or code related to this paper is available at: [24].

#### **1 Introduction**

Workflow Petri Nets are a popular model for the representation and analysis of business processes [1,3,7]. They are used as back-end for different notations like BPMN (Business Process Modeling Notation), EPC (Event-driven Process Chain), and UML Activity Diagrams.

The project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme under grant agreement No. 787367 (PaVeS). Further it is partially supported by the DFG Project No. 273811150 (Negotiations: A Model for Tractable Concurrency).

T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 154–171, 2019. https://doi.org/10.1007/978-3-030-17465-1\_9

There is recent interest in extending these notations with quantitative information, like probabilities, costs, and time. The final goal is the development of tool support for computing performance metrics, like the average cost or the average runtime of a business process.

In a former paper we introduced Probabilistic Workflow Nets (PWN), a foundation for the extension of Petri nets with probabilities and rewards [11]. We presented a polynomial time algorithm for the computation of the expected cost of free-choice workflow nets, a subclass of PWN of particular interest for the workflow process community (see e.g. [1,10,13,14]). For example, 1386 of the 1958 nets in the most popular benchmark suite in the literature are free-choice Workflow Nets [12].

In this paper we introduce Timed PWNs (TPWNs), an extension of PWNs with time. Following [11], we define a semantics in terms of Markov Decision Processes (MDPs), where, loosely speaking, the nondeterminism of the MDP models absence of information about the order in which concurrent transitions are executed. For every scheduler, the semantics assigns to the TPWN an expected time to termination. Using results of [11], we prove that this expected time is actually independent of the scheduler, and so that the notion "expected time of a TPWN" is well defined.

We then proceed to study the problem of computing the expected time of a sound TPWN (loosely speaking, of a TPWN that terminates successfully with probability 1). The expected cost and the expected time have a different interplay with concurrency. The cost of executing two tasks in parallel is the sum of the costs (cost models e.g. salaries of power consumption), while the execution time of two parallel tasks is the maximum of their individual execution times. For this reason, standard reward-based algorithms for MDPs, which assume additivity of the reward along a path, cannot be applied.

Our solution to this problem uses the fact that the expected time of a TPWN is independent of the scheduler. We define an "earliest-first" scheduler which, loosely speaking, resolves the nondeterminism of the MDP by picking transitions with earliest possible firing time. Since at first sight the scheduler needs infinite memory, its corresponding Markov chain is infinite-state, and so of no help. However, we show how to construct another finite-state Markov chain with additive rewards, whose expected reward is equal to the expected time of the infinite-state chain. This finite-state Markov chain can be exponentially larger than the TPWN, and so our algorithm has exponential complexity. We prove that computing the expected time is #P-hard, even for free-choice TPWNs in which all transitions times are either 1 or 0, and all probabilities are 1 or <sup>1</sup>/2. So, in particular, the existence of a polynomial algorithm implies P = NP.

In the rest of the paper we show that, despite these negative results, our algorithm behaves well in practice. For all 642 sound free-choice nets of the benchmark suite of [12], computing the expected time never takes longer than a few milliseconds. Looking for a more complicated set of examples, we study a TPWN computed from a set of logs by process mining. We observe that the computation of the expected time is sensitive to the distribution of the execution time of a task. Still, our experiments show that even for complicated distributions leading to TPWNs with hundreds of transitions and times spanning two orders of magnitude the expected time can be computed in minutes.

All missing proofs can be found in the Appendix of the full version [19].

### **2 Preliminaries**

We introduce some preliminary definitions. The full version [19] gives more details.

**Workflow Nets.** A *workflow net* is a tuple **N** = (P, T, F, i, o) where P and T are disjoint finite sets of *places* and *transitions*; <sup>F</sup> <sup>⊆</sup> (<sup>P</sup> <sup>×</sup> <sup>T</sup>) <sup>∪</sup> (<sup>T</sup> <sup>×</sup> <sup>P</sup>) is a set of *arcs*; i, o <sup>∈</sup> <sup>P</sup> are distinguished *initial* and *final* places such that <sup>i</sup> has no incoming arcs, <sup>o</sup> has no outgoing arcs, and the graph (<sup>P</sup> <sup>∪</sup> T,F ∪ {(o, i)}) is strongly connected. For <sup>x</sup> <sup>∈</sup> <sup>P</sup> <sup>∪</sup> <sup>T</sup>, we write •<sup>x</sup> for the set {<sup>y</sup> <sup>|</sup> (y, x) <sup>∈</sup> <sup>F</sup>} and <sup>x</sup>• for {<sup>y</sup> <sup>|</sup> (x, y) <sup>∈</sup> <sup>F</sup>}. We call •<sup>x</sup> (resp. <sup>x</sup>•) the *preset* (resp. *postset*) of <sup>x</sup>. We extend this notion to sets <sup>X</sup> <sup>⊆</sup> <sup>P</sup> <sup>∪</sup> <sup>T</sup> by •<sup>X</sup> def <sup>=</sup> <sup>∪</sup><sup>x</sup>∈X•<sup>x</sup> resp. <sup>X</sup>• def <sup>=</sup> <sup>∪</sup><sup>x</sup>∈<sup>X</sup>x•. The notions of marking, enabled transitions, transition firing, firing sequence, and reachable marking are defined as usual. The *initial marking* (resp. *final marking*) of a workflow net, denoted by *i* (resp. *o*), has one token on place i (resp. <sup>o</sup>), and no tokens elsewhere. A firing sequence <sup>σ</sup> is a *run* if *<sup>i</sup>* <sup>σ</sup> −→ *o*, i.e. if it leads to the final marking. *Run***<sup>N</sup>** denotes the set of all runs of **N**.

**Soundness and 1-safeness.** Well designed workflows should be free of deadlocks and livelocks. This idea is captured by the notion of soundness [1,2]: A workflow net is *sound* if the final marking is reachable from any reachable marking.<sup>1</sup> Further, in this paper we restrict ourselves to 1-safe workflows: A marking <sup>M</sup> of a workflow net <sup>W</sup> is *1-safe* if <sup>M</sup>(p) <sup>≤</sup> 1 for every place <sup>p</sup>, and <sup>W</sup> itself is *1-safe* if every reachable marking is 1-safe. We identify 1-safe markings M with the set {<sup>p</sup> <sup>∈</sup> <sup>P</sup> <sup>|</sup> <sup>M</sup>(p)=1}.

**Independence, concurrency, conflict** [22]**.** Two transitions t1, t<sup>2</sup> of a workflow net are *independent* if •t<sup>1</sup> <sup>∩</sup> •t<sup>2</sup> <sup>=</sup> <sup>∅</sup>, and *dependent* otherwise. Given a 1-safe marking M, two transitions are *concurrent at* M if M enables both of them, and they are independent, and *in conflict at* M if M enables both of them, and they are dependent. Finally, we recall the definition of Mazurkiewicz equivalence. Let **<sup>N</sup>** = (P, T, F, i, o) be a 1-safe workflow net. The relation <sup>≡</sup>1<sup>⊆</sup> <sup>T</sup> <sup>∗</sup> <sup>×</sup> <sup>T</sup> <sup>∗</sup> is defined as follows: <sup>σ</sup> <sup>≡</sup><sup>1</sup> <sup>τ</sup> if there are independent transitions <sup>t</sup>1, t<sup>2</sup> and sequences σ , σ <sup>∈</sup> <sup>T</sup> <sup>∗</sup> such that <sup>σ</sup> <sup>=</sup> <sup>σ</sup> <sup>t</sup><sup>1</sup> <sup>t</sup>2σ and <sup>τ</sup> <sup>=</sup> <sup>σ</sup> <sup>t</sup><sup>2</sup> <sup>t</sup>1σ. Two sequences σ, τ <sup>∈</sup> <sup>T</sup> <sup>∗</sup> are *Mazurkiewicz equivalent* if <sup>σ</sup> <sup>≡</sup> <sup>τ</sup> , where <sup>≡</sup> is the reflexive and transitive closure of <sup>≡</sup>1. Observe that <sup>σ</sup> <sup>∈</sup> <sup>T</sup> <sup>∗</sup> is a firing sequence iff every sequence <sup>τ</sup> <sup>≡</sup> <sup>σ</sup> is a firing sequence.

**Confusion-freeness, free-choice workflows.** Let t be a transition of a workflow net, and let M be a 1-safe marking that enables t. The *conflict set of* t

<sup>1</sup> In [2], which examines many different notions of soundness, this is called *e*asy soundness.

*at* M, denoted C(t, M), is the set of transitions in conflict with t at M. A set U of transitions is a *conflict set* of M if there is a transition t such that <sup>U</sup> <sup>=</sup> <sup>C</sup>(t, M). The conflict sets of <sup>M</sup> are given by <sup>C</sup>(M) def <sup>=</sup> <sup>∪</sup>t∈<sup>T</sup> <sup>C</sup>(t, M). A 1-safe workflow net is *confusion-free* if for every reachable marking M and every transition t enabled at M, every transition u concurrent with t at M satisfies <sup>C</sup>(u, M) = <sup>C</sup>(u, M \ •t) = <sup>C</sup>(u,(<sup>M</sup> \ •t) <sup>∪</sup> <sup>t</sup> •). The following result follows easily from the definitions (see also [11]):

**Lemma 1** [11]**.** *Let* **N** *be a 1-safe workflow net. If* **N** *is confusion-free then for every reachable marking* <sup>M</sup> *the conflict sets* <sup>C</sup>(M) *are a partition of the set of transitions enabled at* M*.*

A workflow net is *free-choice* if for every two places p1, p2, if p• 1∩p• 2 = ∅, then p• <sup>1</sup> = p• <sup>2</sup>. Any free-choice net is confusion-free, and the conflict set of a transition t enabled at a marking M is given by C(t, M)=(•t) • (see e.g. [11]).

### **3 Timed Probabilistic Workflow Nets**

In [11] we introduced a probabilistic semantics for confusion-free workflow nets. Intuitively, at every reachable marking a choice between two concurrent transitions is resolved nondeterministically by a scheduler, while a choice between two transitions in conflict is resolved probabilistically; the probability of choosing each transition is proportional to its *weight*. For example, in the net in Fig. 1a, at the marking {p1, p3}, the scheduler can choose between the conflict sets {t2, t3} and {t4}, and if {t2, t3} is chosen, then <sup>t</sup><sup>2</sup> is chosen with probability <sup>1</sup>/<sup>5</sup> and <sup>t</sup><sup>3</sup> with probability <sup>4</sup>/5. We extend Probabilistic Workflow Nets by assigning to each transition t a natural number τ (t) modeling the time it takes for the transition to fire, once it has been selected.<sup>2</sup>

**Definition 1 (Timed Probabilistic Workflow Nets).** *A* Timed Probabilistic Workflow Net *(TPWN) is a tuple* <sup>W</sup> = (**N**, w, τ ) *where* **<sup>N</sup>** = (P, T, F, i, o) *is a 1-safe confusion-free workflow net,* <sup>w</sup>: <sup>T</sup> <sup>→</sup> <sup>Q</sup><sup>&</sup>gt;<sup>0</sup> *is a* weight function*, and* <sup>τ</sup> : <sup>T</sup> <sup>→</sup> <sup>N</sup> *is a* time function *that assigns to every transition a duration.*

**Timed sequences.** We assign to each transition sequence <sup>σ</sup> of <sup>W</sup> and each place <sup>p</sup> <sup>a</sup> *timestamp* <sup>μ</sup>(σ)<sup>p</sup> through a *timestamp function* <sup>μ</sup> : <sup>T</sup> <sup>∗</sup> <sup>→</sup> <sup>N</sup><sup>P</sup> <sup>⊥</sup>. The set <sup>N</sup><sup>⊥</sup> is defined by <sup>N</sup><sup>⊥</sup> def <sup>=</sup> {⊥}∪<sup>N</sup> with ⊥ ≤ <sup>x</sup> and <sup>⊥</sup>+<sup>x</sup> <sup>=</sup> <sup>⊥</sup> for all <sup>x</sup> <sup>∈</sup> <sup>N</sup>⊥. Intuitively, if a place p is marked after σ, then μ(σ)<sup>p</sup> records the "arrival time" of the token in <sup>p</sup>, and if <sup>p</sup> is unmarked, then <sup>μ</sup>(σ)<sup>p</sup> <sup>=</sup> <sup>⊥</sup>. When a transition occurs, it removes all tokens in its preset, and τ (t) time units later, puts tokens into its postset.

<sup>2</sup> The semantics of the model can be defined in the same way for both discrete and continuous time, but, since our results only concern discrete time, we only consider this case.

Formally, we define μ()<sup>i</sup> def = 0, μ()<sup>p</sup> def <sup>=</sup> <sup>⊥</sup> for <sup>p</sup> = i, and μ(σt) def = upd(μ(σ), t), where the update function upd : N<sup>P</sup> <sup>⊥</sup> <sup>×</sup> <sup>T</sup> <sup>→</sup> <sup>N</sup><sup>P</sup> <sup>⊥</sup> is given by:

$$\operatorname{upd}(\mathfrak{x},t)\_p \stackrel{\text{def}}{=} \begin{cases} \max\_{q \in \bullet\_t} \boldsymbol{x}\_q + \tau(t) & \text{if } p \in t^\bullet \\ \bot & \text{if } p \in \bullet^\bullet t \\ \boldsymbol{x}\_p & \text{if } p \notin \bullet^\bullet t \cup t^\bullet \end{cases}$$

We then define *tm*(σ) def = maxp∈<sup>P</sup> <sup>μ</sup>(σ)<sup>p</sup> as the time needed to fire <sup>σ</sup>. Further *x* def <sup>=</sup> {<sup>p</sup> <sup>∈</sup> <sup>P</sup> <sup>|</sup> *<sup>x</sup>*<sup>p</sup> <sup>=</sup> ⊥} is the marking represented by a timestamp *<sup>x</sup>* <sup>∈</sup> <sup>N</sup><sup>P</sup> ⊥.

*Example 1.* The net in Fig. 1a is a TPWN. Weights are shown in red next to transitions, and times are written in blue into the transitions. For the sequence σ<sup>1</sup> = t1t3t4t5, we have *tm*(σ1) = 9, and for σ<sup>2</sup> = t1t2t3t4t5, we have *tm*(σ2) = 10. Observe that the time taken by the sequences is *not* equal to the sum of the durations of the transitions.

**Markov Decision Process semantics.** A *Markov Decision Process* (MDP) is a tuple <sup>M</sup> = (Q, q0, *Steps*) where <sup>Q</sup> is a finite set of states, <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup> is the initial state, and *Steps* : <sup>Q</sup> <sup>→</sup> <sup>2</sup>dist(Q) is the probability transition function. Paths of an MDP, schedulers, and the probability measure of paths compatible with a scheduler are defined as usual (see the Appendix of the full version [19]).

The semantics of a TPWN W is a Markov Decision Process *MDP*W. The states of *MDP*<sup>W</sup> are either markings <sup>M</sup> or pairs (M, t), where <sup>t</sup> is a transition enabled at M. The intended meanings of M and (M, t) are "the current marking is M", and "the current marking is M, and t has been selected to fire next." Intuitively, t is chosen in two steps: first, a conflict set enabled at M is chosen nondeterministically, and then a transition of this set is chosen at random, with probability proportional to its weight.

Formally, let <sup>W</sup> = (**N**, w, τ ) be a TPWN where **<sup>N</sup>** = (P, T, F, i, o), let <sup>M</sup> be a reachable marking of <sup>W</sup> enabling at least one transition, and let <sup>C</sup> be a conflict set of M. Let w(C) be the sum of the weights of the transitions in C. The *probability distribution* PM,C *over* T is given by PM,C (t) = <sup>w</sup>(t) <sup>w</sup>(C) if <sup>t</sup> <sup>∈</sup> <sup>C</sup> and <sup>P</sup>M,C (t) = 0 otherwise. Now, let <sup>M</sup> be the set of 1-safe markings of <sup>W</sup>, and let <sup>E</sup> be the set of pairs (M, t) such that <sup>M</sup> ∈ M and <sup>M</sup> enables <sup>t</sup>. We define the Markov decision process *MDP*<sup>W</sup> = (Q, q0, *Steps*), where <sup>Q</sup> <sup>=</sup> M∪E, <sup>q</sup><sup>0</sup> <sup>=</sup> **<sup>i</sup>**, the initial marking of <sup>W</sup>, and *Steps*(M) is defined for markings of <sup>M</sup> and <sup>E</sup> as follows. For every <sup>M</sup> ∈ M,


For every (M, t) ∈ E, *Steps*(M, t) contains one single distribution that assigns probability 1 to the marking <sup>M</sup> such that <sup>M</sup> <sup>t</sup> −→ <sup>M</sup> , and probability 0 to every other state.

**Fig. 1.** A TPWN and its associated MDP. (Color figure online)

*Example 2.* Figure 1b shows a graphical representation of the MDP of the TPWN in Fig. 1a. Black nodes represent states, white nodes probability distributions. A black node q has a white successor for each probability distribution in *Steps*(q). A white node λ has a black successor for each node q such that λ(q) > 0; the arrow leading to this black successor is labeled with λ(q), unless λ(q) = 1, in which case there is no label. States (M, t) are abbreviated to t.

**Schedulers.** Given a TPWN <sup>W</sup>, a scheduler of *MDP*<sup>W</sup> is a function <sup>γ</sup> : <sup>T</sup> <sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>T</sup> assigning to each firing sequence *<sup>i</sup>* <sup>σ</sup> −→ <sup>M</sup> with <sup>C</sup>(M) = ∅ a conflict set <sup>γ</sup>(σ) ∈ C(M). A firing sequence *<sup>i</sup>* <sup>σ</sup> −→ <sup>M</sup> is *compatible* with a scheduler <sup>γ</sup> if for all partitions <sup>σ</sup> <sup>=</sup> <sup>σ</sup>1tσ<sup>2</sup> for some transition <sup>t</sup>, we have <sup>t</sup> <sup>∈</sup> <sup>γ</sup>(σ1).

*Example 3.* In the TPWN of Fig. 1a, after firing t<sup>1</sup> two conflict sets become concurrently enabled: {t2, t3} and {t4}. A scheduler picks one of the two. If the scheduler picks {t2, t3} then <sup>t</sup><sup>2</sup> may occur, and in this case, since firing <sup>t</sup><sup>2</sup> does not change the marking, the scheduler chooses again one of {t2, t3} and {t4}. So there are infinitely many possible schedulers, differing only in how many times they pick {t2, t3} before picking <sup>t</sup>4.

**Definition 2 ((Expected) Time until a state is reached).** *Let* π *be an infinite path of MDP*W*, and let* <sup>M</sup> *be a reachable marking of* <sup>W</sup>*. Observe that* <sup>M</sup> *is a state of MDP*W*. The* time needed to reach <sup>M</sup> along <sup>π</sup>*, denoted tm*(M,π)*, is defined as follows: If* π *does not visit* M*, then tm*(M,π) *def* = ∞*; otherwise, tm*(M,π) *def* = *tm*(Σ(π ))*, where* Σ(π ) *is the transition sequence corresponding to the shortest prefix* π *of* π *ending at* M*. Given a scheduler* S*, the expected time until reaching* M *is defined as*

$$ET^S\_{\mathcal{W}}(M) \stackrel{def}{=} \sum\_{\pi \in Paths^S} tm(M, \pi) \cdot Prob^S(\pi).$$

*and the expected time* ET <sup>S</sup> <sup>W</sup> *is defined as* ET <sup>S</sup> W *def* = ET <sup>S</sup> <sup>W</sup>(*o*)*, i.e. the expected time until reaching the final marking.*

In [11] we proved a result for Probabilistic Workflow Nets (PWNs) with rewards, showing that the expected reward of a PWN is independent of the scheduler (intuitively, this is the case because in a confusion-free Petri net the scheduler only determines the logical order in which transitions occur, but not which transitions occur). Despite the fact that, contrary to rewards, the execution time of a firing sequence is not the sum of the execution times of its transitions, the proof carries over to the expected time with only minor modifications.

### **Theorem 1.** *Let* W *be a TPWN.*


By this theorem, the expected time ET<sup>W</sup> can be computed by choosing a suitable scheduler S, and computing ET <sup>S</sup> W.

#### **4 Computation of the Expected Time**

We show how to compute the expected time of a TPWN. We fix an appropriate scheduler, show that it induces a finite-state Markov chain, define an appropriate reward function for the chain, and prove that the expected time is equal to the expected reward.

#### **4.1 Earliest-First Scheduler**

Consider a firing sequence *<sup>i</sup>* <sup>σ</sup> −→ <sup>M</sup>. We define the *starting time* of a conflict set <sup>C</sup> ∈ C(M) as the earliest time at which the transitions of <sup>C</sup> become enabled. This occurs after *all* tokens of •C arrive<sup>3</sup>, and so the starting time of C is the maximum of <sup>μ</sup>(σ)<sup>p</sup> for <sup>p</sup> <sup>∈</sup> •<sup>C</sup> (recall that <sup>μ</sup>(σ)<sup>p</sup> is the latest time at which a token arrives at p while firing σ).

Intuitively, the "earliest-first" scheduler always chooses the conflict set with the earliest starting time (if there are multiple such conflict sets, the scheduler chooses any one of them). Formally, recall that a scheduler is a mapping <sup>γ</sup> : <sup>T</sup> <sup>∗</sup> <sup>→</sup> <sup>2</sup><sup>T</sup> such that for every firing sequence *<sup>i</sup>* <sup>σ</sup> −→ <sup>M</sup>, the set <sup>γ</sup>(σ) is a conflict set of M. We define the *earliest-first scheduler* γ by:

$$\gamma(\sigma) \stackrel{\text{def}}{=} \underset{C \in \mathcal{C}(M)}{\text{arg min }} \max\_{p \in \mathsf{P}} \mu(\sigma)\_p \qquad \text{where } M \text{ is given by } \mathfrak{i} \stackrel{\sigma}{\to} M.$$

<sup>3</sup> This is proved in Lemma 7 in the Appendix of the full version [19].

*Example 4.* Figure 2a shows the Markov chain induced by the "earliest-first" scheduler defined above in the MDP of Fig. 1b. Initially we have a token at *i* with arrival time 0. After firing t1, which takes time 1, we obtain tokens in p<sup>1</sup> and <sup>p</sup><sup>3</sup> with arrival time 1. In particular, the conflict sets {t2, t3} and {t4} become enabled at time 1. The scheduler can choose any of them, because they have the same starting time. Assume it chooses {t2, t3}. The Markov chain now branches into two transitions, corresponding to firing t<sup>2</sup> and t<sup>3</sup> with probabilities <sup>1</sup>/<sup>5</sup> and <sup>4</sup>/5, respectively. Consider the branch in which t<sup>2</sup> fires. Since t<sup>2</sup> starts at time 1 and takes 4 time units, it removes the token from p<sup>1</sup> at time 1, and adds a new token to p<sup>1</sup> with arrival time 5; the token at p<sup>3</sup> is not affected, and it keeps its arrival time of 1. So we have <sup>μ</sup>(t1t2) = <sup>p</sup><sup>1</sup> <sup>5</sup> , <sup>p</sup><sup>3</sup> 1 (meaning μ(t1t2)<sup>p</sup><sup>1</sup> = 5, <sup>μ</sup>(t1t2)<sup>p</sup><sup>3</sup> = 1, and <sup>μ</sup>(t1t2)<sup>p</sup> <sup>=</sup> <sup>⊥</sup> otherwise). Now the conflict sets {t2, t3} and {t4} are enabled again, but with a difference: while {t4} has been enabled since time 1, the set {t2, t3} is now enabled since time <sup>μ</sup>(t1t2)<sup>p</sup><sup>1</sup> = 5. The scheduler must now choose {t4}, leading to the marking that puts tokens on <sup>p</sup><sup>1</sup> and <sup>p</sup><sup>4</sup> with arrival times μ(t1t2t4)<sup>p</sup><sup>1</sup> = 5 and μ(t1t2t4)<sup>p</sup><sup>4</sup> = 6. In the next steps the scheduler always chooses {t2, t3} until <sup>t</sup><sup>5</sup> becomes enabled. The final marking *<sup>o</sup>* can be reached after time 9, through t1t3t4t<sup>5</sup> with probability <sup>4</sup>/5, or with times 10 + 4<sup>k</sup> for <sup>k</sup> <sup>∈</sup> <sup>N</sup>, through <sup>t</sup>1t2t4<sup>t</sup> k <sup>2</sup>t3t<sup>5</sup> with probability (1/5) <sup>k</sup>+1 · <sup>4</sup>/<sup>5</sup> (the times at which the final marking can be reached are written in blue inside the final states).

Theorem 2 below shows that the earliest-first scheduler only needs finite memory, which is not clear from the definition. The construction is similar to those of [6,15,16]. However, our proof crucially depends on TPWNs being confusionfree.

**Theorem 2.** *Let* H *def* = max<sup>t</sup>∈<sup>T</sup> <sup>τ</sup> (t) *be the maximum duration of the transitions of* T*, and let* [H] ⊥ *def* <sup>=</sup> {⊥, <sup>0</sup>, <sup>1</sup>,...,H} ⊆ <sup>N</sup>⊥*. There are functions* <sup>ν</sup> : <sup>T</sup> <sup>∗</sup> <sup>→</sup> [H] P ⊥ *(compare with* <sup>μ</sup>: <sup>T</sup> <sup>∗</sup> <sup>→</sup> <sup>N</sup><sup>P</sup> <sup>⊥</sup>*),* <sup>f</sup> : [H] P <sup>⊥</sup> <sup>×</sup> <sup>T</sup> <sup>→</sup> [H] P <sup>⊥</sup> *and* <sup>r</sup> : [H] P <sup>⊥</sup> <sup>→</sup> <sup>N</sup> *such that for every* <sup>σ</sup> <sup>=</sup> <sup>t</sup><sup>1</sup> ...t<sup>n</sup> <sup>∈</sup> <sup>T</sup> <sup>∗</sup> *compatible with* <sup>γ</sup> *and for every* <sup>t</sup> <sup>∈</sup> <sup>T</sup> *enabled by* <sup>σ</sup>*:*

$$\gamma(\sigma) = \operatorname\*{arg\,min}\_{C \in \mathcal{C}(\{\nu(\sigma)\})} \max\_{p \in \, \bullet C} \nu(\sigma)\_p \tag{1}$$

$$\nu(\sigma t) = f(\nu(\sigma), t) \tag{2}$$

$$tm(\sigma) = \max\_{p \in P} \nu(\sigma)\_p + \sum\_{k=0}^{n-1} r(\nu(t\_1 \dots t\_k))\tag{3}$$

Observe that, unlike μ, the range of ν is finite. We call it the *finite abstraction* of μ. Equation 1 states that γ can be computed directly from the finite abstraction ν. Equation 2 shows that ν(σt) can be computed from ν(σ) and t. So γ only needs to remember an element of [H] P <sup>⊥</sup>, which implies that it only requires finite memory. Finally, observe that the function r of Eq. 3 has a finite domain, and so it allows us to use ν to compute the time needed by σ.

(a) Infinite MC for scheduler using μ(σ), with final states labeled by *tm*(σ). (b) Finite MC for scheduler using ν(σ), with states labeled by rewards r(ν(σ)).

**Fig. 2.** Two Markov chains for the "earliest-first" scheduler. (Color figure online)

The formal definition of the functions ν, f, and r is given below, together with the definition of the auxiliary operator : <sup>N</sup><sup>P</sup> <sup>⊥</sup> <sup>×</sup> <sup>N</sup> <sup>→</sup> <sup>N</sup><sup>P</sup> ⊥:

$$\begin{array}{llll} & \mu(\boldsymbol{x}\ominus\boldsymbol{n})\_{p} \stackrel{\text{def}}{=} \begin{cases} \max(\boldsymbol{x}\_{p}-\boldsymbol{n},0) & \text{if } \boldsymbol{x}\_{p}\neq\perp \\ \bot & \text{if } \boldsymbol{x}\_{p}=\perp \end{cases} & \begin{array}{llll} f(\boldsymbol{x},t) \stackrel{\text{def}}{=} \uprightharpoonup{\operatorname{up}}\boldsymbol{\pi}\_{t} \\\ \boldsymbol{\mu}(\boldsymbol{\sigma}t) \stackrel{\text{def}}{=} \uprightharpoonup{\operatorname{\boldsymbol{\mu}}\boldsymbol{\sigma}}\boldsymbol{\mu}(\boldsymbol{\sigma})\_{p} \end{array} & \begin{array}{llll} \boldsymbol{\mu}(\boldsymbol{x},t) \stackrel{\text{def}}{=} \uprightharpoonup{\operatorname{\boldsymbol{\mu}}\boldsymbol{\pi}\_{t}}\boldsymbol{\pi}\_{p} \\\ \boldsymbol{\mu}(\boldsymbol{\sigma}t) \stackrel{\text{def}}{=} \uprightharpoonup{\operatorname{\boldsymbol{\mu}}\boldsymbol{\sigma}}\boldsymbol{\pi}\_{t} \end{array} \end{array}$$

*Example 5.* Figure 2b shows the finite-state Markov chain induced by the "earliest-first" scheduler computed using the abstraction ν. Consider the firing sequence <sup>t</sup>1t3. We have <sup>μ</sup>(t1t3) = <sup>p</sup><sup>2</sup> <sup>3</sup> , <sup>p</sup><sup>3</sup> 1 , i.e. the tokens in p<sup>2</sup> and p<sup>3</sup> arrive at times 3 and 1, respectively. Now we compute ν(t1t3), which corresponds to the *local* arrival times of the tokens, i.e. the time elapsed *since the last transition starts to fire until the token arrives*. Transition t<sup>3</sup> starts to fire at time 1, and so the local arrival times of the tokens in p<sup>2</sup> and p<sup>3</sup> are 2 and 0, respectively, i.e. we have <sup>ν</sup>(t1t3) = <sup>p</sup><sup>2</sup> <sup>2</sup> , <sup>p</sup><sup>3</sup> 0 . Using these local times we compute the local starting time of the conflict sets enabled at {p2, p3}. The scheduler always chooses the conflict set with earliest local starting time. In Fig. 2b the earliest local starting time of the state reached by firing σ, which is denoted r(ν(σ)), is written in blue inside the state. The theorem above shows that this scheduler always chooses the same conflict sets as the one which uses the function μ, and that the time of a sequence can be obtained by adding the local starting times. This allows us to consider the earliest local starting time of a state as a *reward* associated to the state; then, the time taken by a sequence is equal to the sum of the rewards along the corresponding path of the chain. For example, we have *tm*(t1t2t4t3t5) = 0 + 1 + 0 + 4 + 2 + 3 = 10.

Finally, let us see how ν(σt) is computed from ν(σ) for σ = t1t2t<sup>4</sup> and t = t2. We have ν(σ) = <sup>p</sup><sup>1</sup> <sup>4</sup> , <sup>p</sup><sup>4</sup> 5 , i.e. the local arrival times for the tokens in p<sup>1</sup> and p<sup>4</sup> are 4 and 5, respectively. Now {t2, t3} is scheduled next, with local starting time r(ν(σ)) = ν(σ)p<sup>1</sup> = 4. If t<sup>2</sup> fires, then, since τ (t2) = 4, we first add 4 to the time of <sup>p</sup>1, obtaining <sup>p</sup><sup>1</sup> <sup>8</sup> , <sup>p</sup><sup>4</sup> 5 . Second, we subtract 4 from *all* times, to obtain the time elapsed since t<sup>2</sup> started to fire (for local times the origin of time changes every time a transition fires), yielding the final result <sup>ν</sup>(σt2) = <sup>p</sup><sup>1</sup> <sup>4</sup> , <sup>p</sup><sup>4</sup> 1 .

#### **4.2 Computation in the Probabilistic Case**

Given a TPWN and its corresponding MDP, in the previous section we have defined a finite-state earliest-first scheduler and a reward function of its induced Markov chain. The reward function has the following property: the execution time of a firing sequence compatible with the scheduler is equal to the sum of the rewards of the states visited along it. From the theory of Markov chains with rewards, it follows that the expected accumulated reward until reaching a certain state, provided that this state is reached with probability 1, can be computed by solving a linear equation system. We use this result to compute the expected time ET<sup>W</sup> .

Let <sup>W</sup> be a sound TPWN. For every firing sequence <sup>σ</sup> compatible with the earliest-first scheduler γ, the finite-state Markov chain induced by γ contains a state *<sup>x</sup>* <sup>=</sup> <sup>ν</sup>(σ) <sup>∈</sup> [H] P <sup>⊥</sup>. Let <sup>C</sup>*<sup>x</sup>* be the conflict set scheduled by <sup>γ</sup> at *<sup>x</sup>*. We define a system of linear equations with variables X*x*, one for each state *x*:

$$\begin{aligned} X\_x &= r(x) + \sum\_{t \in C\_x} \frac{w(t)}{w(C\_x)} \cdot X\_{f(x,t)} & \quad \text{if } [x] \neq \mathbf{o} \\ X\_x &= \max\_{p \in P} x\_p & \quad \text{if } [x] = \mathbf{o} \end{aligned} \tag{4}$$

The solution of the system is the expected reward of a path leading from *i* to *o*. By the theory of Markov chains with rewards/costs ([4], Chap. 10.5), we have:

**Lemma 2.** *Let* W *be a sound TPWN. Then the system of linear equations (4) has a unique solution <sup>X</sup>, and* ET<sup>W</sup> <sup>=</sup> *<sup>X</sup>*<sup>ν</sup>()*.*

**Theorem 3.** *Let* <sup>W</sup> *be a TPWN. Then* ET<sup>W</sup> *is either* <sup>∞</sup> *or a rational number and can be computed in single exponential time.*

*Proof.* We assume that the input has size n and all times and weights are given in binary notation. Testing whether W is sound can be done by exploration of the state space of reachable markings in time <sup>O</sup>(2<sup>n</sup>). If <sup>W</sup> is unsound, we have ET<sup>W</sup> <sup>=</sup> <sup>∞</sup>.

Now assume that <sup>W</sup> is sound. By Lemma 2, ET<sup>W</sup> is the solution to the linear equation system (4), which is finite and has rational coefficients, so it is a rational number. The number of variables |*X*| of (4) is bounded by the size of [H] P <sup>⊥</sup>, and as <sup>H</sup> = maxt∈<sup>T</sup> <sup>τ</sup> (t) we have <sup>|</sup>*X*| ≤ (1 + <sup>H</sup>)|<sup>P</sup> <sup>|</sup> <sup>≤</sup> (1 + 2n) n <sup>≤</sup> <sup>2</sup>n2+n. The linear equation system can be solved in time O <sup>n</sup><sup>2</sup> · |*X*<sup>|</sup> 3 and therefore in time <sup>O</sup>(2p(n)) for some polynomial <sup>p</sup>.

### **5 Lower Bounds for the Expected Time**

We analyze the complexity of computing the expected time of a TPWN. Botezano *et al.* show in [5] that deciding if the expected time exceeds a given bound is NP-hard. However, their reduction produces TPWNs with weights and times of arbitrary size. An open question is if the expected time can be computed in polynomial time when the times (and weights) must be taken from a finite set. We prove that this is not the case unless P = NP, even if all times are 0 or 1, all weights are 1, the workflow net is sound, acyclic and free-choice, and the size of each conflict set is at most 2 (resulting only in probabilities 1 or <sup>1</sup>/2). Further, we show that even computing an -approximation is equally hard. These two results above are a consequence of the main theorem of this section: computing the expected time is #P-hard [23]. For example, counting the number of satisfying assignments for a boolean formula (#SAT) is a #P-complete problem. Therefore a polynomial-time algorithm for a #P-hard problem would imply P = NP.

The problem used for the reduction is defined on PERT networks [9], in the specialized form of *two-state stochastic PERT networks* [17], described below.

**Definition 3.** *A* two-state stochastic PERT network *is a tuple* **PN** = (G, s, t, *p*)*, where* G = (V,E) *is a directed acyclic graph with vertices* V *, representing events, and edges* E*, representing tasks, with a single source vertex* s *and sink vertex* <sup>t</sup>*, and where the vector <sup>p</sup>* <sup>∈</sup> <sup>Q</sup><sup>E</sup> *assigns to each edge* <sup>e</sup> <sup>∈</sup> <sup>E</sup> *<sup>a</sup> rational probability* <sup>p</sup><sup>e</sup> <sup>∈</sup> [0, 1]*. We assume that all* <sup>p</sup><sup>e</sup> *are written in binary.*

*Each edge* <sup>e</sup> <sup>∈</sup> <sup>E</sup> *of* **PN** *defines a random variable* <sup>X</sup><sup>e</sup> *with distribution* Pr(X<sup>e</sup> = 1) = <sup>p</sup><sup>e</sup> *and* Pr(X<sup>e</sup> = 0) = 1−pe*. All* <sup>X</sup><sup>e</sup> *are assumed to be independent. The* project duration P D *of* **PN** *is the length of the longest path in the network*

$$PD(\mathbf{PN}) \stackrel{def}{=} \max\_{\pi \in \Pi} \sum\_{e \in \pi} X\_e$$

*where* Π *is the set of paths from vertex* s *to vertex* t*. As this defines a random variable, the* expected project duration *of* **PN** *is then given by* E(P D(**PN**))*.*

*Example 6.* Figure 3a shows a small PERT network (without *p*), where the project duration depends on the paths <sup>Π</sup> <sup>=</sup> {e1e3e6, e1e4e7, e2e5e7}.

The following problem is #P-hard (from [17], using the results from [20]):

**Given:** A two-state stochastic PERT network **PN**. **Compute:** The expected project duration E(P D(**PN**)). **First reduction: 0/1 times, arbitrary weights.** We reduce the problem above to computing the expected time of an acyclic TPWN with 0/1 times but arbitrary weights. Given a two-state stochastic PERT network **PN**, we construct a timed probabilistic workflow net W**PN** as follows:


(a) PERT network **PN**. (b) Gadget for e = (u, v) with rational weights p*e*, p*e*. (c) Equivalent gadget for e with weights 1 for p*<sup>e</sup>* = <sup>5</sup>/<sup>8</sup> = (0.101)2.

(d) Timed probabilistic workflow net *W***PN**.

**Fig. 3.** A PERT network and its corresponding timed probabilistic workflow net. The weight *p* is short for 1 *− p*. Transitions without annotations have weight 1.

The result of applying this construction to the PERT network from Fig. 3a is shown in Fig. 3d. It is easy to see that this workflow net is sound, as from any reachable marking, we can fire enabled transitions corresponding to the edges and vertices of the PERT network in the topological order of the graph, eventually firing t<sup>t</sup> and reaching *o*. The net is also acyclic and free-choice.

**Lemma 3.** *Let* **PN** *be a two-state stochastic PERT network and let* W**PN** *be its corresponding TPWN by the construction above. Then* ET<sup>W</sup>**PN** <sup>=</sup> <sup>E</sup>(P D(**PN**))*.*

**Second reduction: 0/1 times, 0/1 weights.** The network constructed this way already uses times 0 and 1, however the weights still use arbitrary rational numbers. We now replace the gadget nets from Fig. 3b by equivalent nets where all transitions have weight 1. The idea is to use the binary encoding of the probabilities pe, deciding if the time is 0 or 1 by a sequence of coin flips. We assume that <sup>p</sup><sup>e</sup> <sup>=</sup> <sup>k</sup> <sup>i</sup>=0 <sup>2</sup>−<sup>i</sup> <sup>p</sup><sup>i</sup> for some <sup>k</sup> <sup>∈</sup> <sup>N</sup> and <sup>p</sup><sup>i</sup> ∈ {0, <sup>1</sup>} for 0 <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>k</sup>. The replacement is shown in Fig. 3c for <sup>p</sup><sup>e</sup> <sup>=</sup> <sup>5</sup>/<sup>8</sup> = (0.101)2.

**Approximating the expected time is #P-hard.** We show that computing an -approximation for ET<sup>W</sup> is #P-hard [17,20].

**Theorem 4.** *The following problem is* #P*-hard:*

**Given:** *A sound, acyclic and free-choice TPWN* W *where all transitions* <sup>t</sup> *satisfy* <sup>w</sup>(t)=1*,* <sup>τ</sup> (t) ∈ {0, <sup>1</sup>} *and* <sup>|</sup>(•t)•| ≤ <sup>2</sup>*, and an* > <sup>0</sup>*.* **Compute:** *A rational* <sup>r</sup> *such that* <sup>r</sup> <sup>−</sup> < ET<sup>W</sup> < r <sup>+</sup> *.*

### **6 Experimental Evaluation**

We have implemented our algorithm to compute the expected time of a TPWN as a package of the tool ProM<sup>4</sup>. It is available via the package manager of the latest nightly build under the package name WorkflowNetAnalyzer.

We evaluated the algorithm on two different benchmarks. All experiments in this section were run on the same machine equipped with an Intel Core i7-6700K CPU and 32 GB of RAM. We measure the actual runtime of the algorithm, split into construction of the Markov chain and solving the linear equation system, and exclude the time overhead due to starting ProM and loading the plugin.

#### **6.1 IBM Benchmark**

We evaluated the tool on a set of 1386 workflow nets extracted from a collection of five libraries of industrial business processes modeled in the IBM WebSphere Business Modeler [12]. All of the 1386 nets in the benchmark libraries are freechoice and therefore confusion-free. We selected the sound and 1-safe nets among them, which are 642 nets. Out of these, 409 are marked graphs, i.e. the size of any conflict set is 1. Out of the remaining 233 nets, 193 are acyclic and 40 cyclic.

As these nets do not come with probabilistic or time information, we annotated transitions with integer weights and times chosen uniformly from different intervals: (1) <sup>w</sup>(t) = <sup>τ</sup> (t) = 1, (2) <sup>w</sup>(t), τ (t) <sup>∈</sup> [1, <sup>10</sup><sup>3</sup>] and (3) <sup>w</sup>(t), τ (t) <sup>∈</sup> [1, <sup>10</sup><sup>6</sup>]. For each interval, we annotated the transitions of each net with random weights and times, and computed the expected time of all 642 nets.

For all intervals, we computed the expected time for any net in less than 50 ms. The analysis time did not differ much for different intervals. The solving time for the linear equation system is on average 5% of the total analysis time,

<sup>4</sup> http://www.promtools.org/.

and at most 68%. The results for the nets with the longest analysis times are given in Table 1. They show that even for nets with a huge state space, thanks to the earliest-first scheduler, only a small number of reachable markings is explored.

**Table 1.** Analysis times and size of the state space *|X|* for the 4 nets with the highest analysis times, given for each of the three intervals [1]*,* [10<sup>3</sup>]*,* [10<sup>6</sup>] of possible times. Here, *R*<sup>N</sup> denotes the number of reachable markings of the net.


#### **6.2 Process Mining Case Study**

As a second benchmark, we evaluated the algorithm on a model of a loan application process. We used the data from the BPI Challenge 2017 [8], an event log containing 31509 cases provided by a financial institute, and took as a model of the process the final net from the report of the winner of the academic category [21], a simple model with high fitness and precision w.r.t. the event log.

**Fig. 4.** Net from [21] of process for personal loan applications in a financial institute, annotated with mean waiting times and local trace weights. Black transitions are invisible transitions not appearing in the event log with time 0.


**Table 2.** Expected time, analysis time and state space size for the net in Fig. 4 for various distributions, where memout denotes reaching the memory limit.

Using the ProM plugin "Multi-perspective Process Explorer" [18] we annotated each transition with waiting times and each transition in a conflict set with a local percentage of traces choosing this transition when this conflict set is enabled. The net with mean times and weights as percentages is displayed in Fig. 4.

For a first analysis, we simply set the execution time of each transition deterministically to its mean waiting time. However, note that the two transitions "O Create Offer" and "W Complete application" are executed in parallel, and therefore the distribution of their execution times influences the total expected time. Therefore we also annotated these two transitions with a histogram of possible execution times from each case. Then we split them up into multiple transitions by grouping the times into buckets of a given interval size, where each bucket creates a transition with an execution time equal to the beginning of the interval, and a weight equal to the number of cases with a waiting time contained in the interval. The times for these transitions range from 6 ms to 31 days. As bucket sizes we chose 12, 6, 4, 2 and 1 hour(s). The net always has 14 places and 15 reachable markings, but a varying number of transitions depending on the chosen bucket size. For the net with the mean as the deterministic time and for the nets with histograms for each bucket size, we then analyzed the expected execution time using our algorithm.

The results are given in Table 2. They show that using the complete distribution of times instead of only the mean can lead to much more precise results. When the linear equation system becomes very large, the solver time dominates the construction time of the system. This may be because we chose to use an exact solver for sparse linear equation systems. In the future, this could possibly be improved by using an approximative iterative solver.

#### **7 Conclusion**

We have shown that computing the expected time to termination of a probabilistic workflow net in which transition firings have deterministic durations is #P-hard. This is the case even if the net is free-choice, and both probabilities and times can be written down with a constant number of bits. So, surprisingly, computing the expected time is much harder than computing the expected cost, for which there is a polynomial algorithm [11].

We have also presented an exponential algorithm for computing the expected time based on earliest-first schedulers. Its performance depends crucially on the maximal size of conflict sets that can be concurrently enabled. In the most popular suite of industrial benchmarks this number turns out to be small. So, very satisfactorily, the expected time of any of these benchmarks, some of which have hundreds of transitions, can still be computed in milliseconds.

**Acknowledgements.** We thank Hagen V¨olzer for input on the implementation and choice of benchmarks.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### **Shepherding Hordes of Markov Chains**

Milan Ceˇ <sup>ˇ</sup> ska<sup>1</sup>, Nils Jansen<sup>2</sup>, Sebastian Junges3(B), and Joost-Pieter Katoen<sup>3</sup>

 Brno University of Technology, Brno, Czech Republic Radboud University, Nijmegen, The Netherlands RWTH Aachen University, Aachen, Germany sebastian.junges@cs.rwth-aachen.de

**Abstract.** This paper considers large families of Markov chains (MCs) that are defined over a set of parameters with finite discrete domains. Such families occur in software product lines, planning under partial observability, and sketching of probabilistic programs. Simple questions, like 'does at least one family member satisfy a property?', are NP-hard. We tackle two problems: distinguish family members that satisfy a given quantitative property from those that do not, and determine a family member that satisfies the property optimally, i.e., with the highest probability or reward. We show that combining two well-known techniques, MDP model checking and abstraction refinement, mitigates the computational complexity. Experiments on a broad set of benchmarks show that in many situations, our approach is able to handle families of millions of MCs, providing superior scalability compared to existing solutions.

#### **1 Introduction**

Randomisation is key to research fields such as dependability (uncertain system components), distributed computing (symmetry breaking), planning (unpredictable environments), and probabilistic programming. Families of alternative designs differing in the structure and system parameters are ubiquitous. Software dependability has to cope with configuration options, in distributed computing the available memory per process is highly relevant, in planning the observability of the environment is pivotal, and program synthesis is all about selecting correct program variants. The automated analysis of such families has to face a formidable challenge—in addition to the state-space explosion affecting each family member, the family size typically grows exponentially in the number of features, options, or observations. This affects the analysis of (quantitative) software product lines [18,28,43,45,46], strategy synthesis in planning under partial observability [12,14,29,36,41], and probabilistic program synthesis [9,13,27,40].

This paper considers families of Markov chains (MCs) to describe configurable probabilistic systems. We consider finite MC families with finite-state family members. Family members may have different transition probabilities and distinct topologies—thus different reachable state spaces. The latter aspect

This work has been supported by the DFG RTG 2236 "UnRAVeL" and the Czech Science Foundation grant No. Robust 17-12465S.

c The Author(s) 2019

T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 172–190, 2019. https://doi.org/10.1007/978-3-030-17465-1\_10

goes beyond the class of parametric MCs as considered in parameter synthesis [10,22,24,31] and model repair [6,16,42].

For an MC family D and quantitative specification ϕ, with ϕ a reachability probability or expected reward objective, we consider the following synthesis problems: (a) does some member in D satisfy a threshold on ϕ? (aka: *feasibility* synthesis), (b) which members of D satisfy this threshold on ϕ and which ones do not? (aka: *threshold synthesis*), and (c) which family member(s) satisfy ϕ optimally, e.g., with highest probability? (aka: *optimal synthesis*).

The simplest synthesis problem, feasibility, is NP-complete and can naively be solved by analysing all individual family members—the so-called *one-by-one* approach. This approach has been used in [18] (and for qualitative systems in e.g. [19]), but is infeasible for large systems. An alternative is to model the family D by a single Markov decision process (MDP)—the so-called *all-in-one* MDP [18]. The initial MDP state non-deterministically chooses a family member of D, and then evolves in the MC of that member. This approach has been implemented in tools such as ProFeat [18], and for purely qualitative systems in [20]. The MDP representation avoids the individual analysis of all family members, but its size is proportional to the family size. This approach therefore does not scale to large families. A symbolic BDD-based approach is only a partial solution as family members may induce different reachable state-sets.

This paper introduces an *abstraction-refinement* scheme over the MDP representation<sup>1</sup>. The abstraction *forgets* in which family member the MDP operates. The resulting *quotient* MDP has a single representative for every reachable state in a family member. It typically provides a very compact representation of the family D and its analysis using off-the-shelf MDP model-checking algorithms yields a speed-up compared to the all-in-one approach. Verifying the quotient MDP yields under- and over-approximations of the min and max probability (or reward), respectively. These bounds are safe as all *consistent* schedulers, i.e., those that pick actions according to a single family member, are contained in all schedulers considered on the quotient MDP. (CEGAR-based MDP model checking for partial information schedulers, a slightly different notion than restricting schedulers to consistent ones, has been considered in [30]. In contrast to our setting, [30] considers history-dependent schedulers and in this general setting no guarantee can be given that bounds on suprema converge [29]).

Model-checking results of the quotient MDP do provide useful insights. This is evident if the resulting scheduler is consistent. If the verification reveals that the min probability exceeds r for a specification ϕ with a ≤ r threshold, then even for inconsistent schedulers—it holds that all family members violate ϕ. If the model checking is inconclusive, i.e., the abstraction is too coarse, we iteratively refine the quotient MDP by splitting the family into sub-families. We do so in an efficient manner that avoids rebuilding the sub-families. Refinement employs a light-weight analysis of the model-checking results.

<sup>1</sup> Classical CEGAR for model checking of software product lines has been proposed in [21]. This uses feature transition systems, is purely qualitative, and exploits existential state abstraction.

We implemented our abstraction-refinement approach using the Storm model checker [25]. Experiments with case studies from software product lines, planning, and distributed computing yield possible speed-ups of up to 3 orders of magnitude over the one-by-one and all-in-one approaches (both symbolic and explicit). Some benchmarks include families of millions of MCs where family members are thousands of states. The experiments reveal that—as opposed to parameter synthesis [10,24,31]—the threshold has a major influence on the synthesis times.

To summarise, this work presents: (a) MDP-based abstraction-refinement for various synthesis problems over large families of MCs, (b) a refinement strategy that mitigates the overhead of analysing sub-families, and (c) experiments showing substantial speed-ups for many benchmarks. Extra material can be found in [1,11].

### **2 Preliminaries**

We present the basic foundations for this paper, for details, we refer to [4,5].

*Probabilistic models.* A *probability distribution* over a finite or countably infinite set X is a function μ: X → [0, 1] with - <sup>x</sup>∈<sup>X</sup> <sup>μ</sup>(x) = <sup>μ</sup>(X) = 1. The set of all distributions on X is denoted *Distr* (X). The support of a distribution μ is supp(μ) = {x ∈ X | μ(x) > 0}. A distribution is *Dirac* if |supp(μ)| = 1.

**Definition 1 (MC).** *A* discrete-time Markov chain *(MC)* D *is a triple* (S, s0, **P**)*, where* S *is a finite set of states,* s<sup>0</sup> ∈ S *is an initial state, and* **P**: S → *Distr* (S) *is a transition probability matrix.*

MCs have unique distributions over successor states at each state. Adding nondeterministic choices over distributions leads to Markov decision processes.

**Definition 2 (MDP).** *A* Markov decision process *(MDP) is a tuple* M = (S, s0, *Act*,P) *where* S, s<sup>0</sup> *as in Definition 1, Act is a finite set of actions, and* <sup>P</sup> : <sup>S</sup> <sup>×</sup> *Act* -*Distr* (S) *is a partial transition probability function.*

The *available actions* in s ∈ S are *Act*(s) = {a ∈ *Act* | P(s, a) = ⊥}. An MDP with |*Act*(s)| = 1 for all s ∈ S is an MC. For MCs (and MDPs), a statereward function is *rew*: <sup>S</sup> <sup>→</sup> <sup>R</sup>≥<sup>0</sup>. The reward *rew*(s) is earned upon leaving <sup>s</sup>.

A *path* of an MDP M is an (in)finite sequence π = s<sup>0</sup> <sup>a</sup><sup>0</sup> −→ <sup>s</sup><sup>1</sup> <sup>a</sup><sup>1</sup> −→··· , where <sup>s</sup><sup>i</sup> <sup>∈</sup> <sup>S</sup>, <sup>a</sup><sup>i</sup> <sup>∈</sup> *Act*(si), and <sup>P</sup>(si, ai)(si+1) = 0 for all <sup>i</sup> <sup>∈</sup> <sup>N</sup>. For finite <sup>π</sup>, last(π) denotes the last state of π. The set of (in)finite paths of M is Paths<sup>M</sup> *fin* (Paths<sup>M</sup>). The notions of paths carry over to MCs (actions are omitted). Schedulers resolve all choices of actions in an MDP and yield MCs.

**Definition 3 (Scheduler).** *A* scheduler *for an MDP* M = (S, s0, *Act*,P) *is a function* σ : Paths<sup>M</sup> *fin* <sup>→</sup> *Act such that* <sup>σ</sup>(π) <sup>∈</sup> *Act*(last(π)) *for all* <sup>π</sup> <sup>∈</sup> Paths<sup>M</sup> *fin. Scheduler* σ *is* memoryless *if* last(π) = last(π ) =⇒ σ(π) = σ(π ) *for all* π, π <sup>∈</sup> Paths<sup>M</sup> *fin. The set of all schedulers of* M *is* Σ<sup>M</sup>*.*

**Definition 4 (Induced Markov Chain).** *The MC* induced by MDP M and <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup> *is given by* <sup>M</sup><sup>σ</sup> = (Paths<sup>M</sup> *fin*, s0, **P**σ) *where:*

$$\mathbf{P}^{\sigma}(\pi,\pi') = \begin{cases} \mathcal{P}(\text{last}(\pi),\sigma(\pi))(s') & \text{if } \pi' = \pi \xrightarrow{\sigma(\pi)} s' \\ 0 & \text{otherwise.} \end{cases}$$

*Specifications.* For a MC D, we consider unbounded reachability specifications of the form <sup>ϕ</sup> <sup>=</sup> <sup>P</sup>∼<sup>λ</sup>(♦G) with <sup>G</sup> <sup>⊆</sup> <sup>S</sup> a set of goal states, <sup>λ</sup> <sup>∈</sup> [0, 1] <sup>⊆</sup> <sup>R</sup>, and ∼ ∈{<, ≤, ≥, >}. The probability to satisfy the path formula φ = ♦G in D is denoted by Prob(D, φ). If ϕ holds for D, that is, Prob(D, φ) ∼ λ, we write D |= ϕ. Analogously, we define expected reward specifications of the form <sup>ϕ</sup> <sup>=</sup> <sup>E</sup>∼<sup>κ</sup>(♦G) with <sup>κ</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>. We refer to <sup>λ</sup>/<sup>κ</sup> as *thresholds*. While we only introduce reachability specifications, our approaches may be extended to richer logics like arbitrary PCTL [32], PCTL\* [3], or ω-regular properties.

For an MDP M, a specification ϕ holds (M |= ϕ) if and only if it holds for the induced MCs of all schedulers. The maximum probability Probmax(M,φ) to satisfy a path formula φ for an MDP M is given by a maximising scheduler <sup>σ</sup>max <sup>∈</sup> <sup>Σ</sup><sup>M</sup>, that is, there is no scheduler <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup> such that Prob(Mσmax , φ) < Prob(M<sup>σ</sup>- , φ). Analogously, we define the minimising probability Probmin(M,φ), and the maximising (minimising) expected reward ExpRewmax(M,φ) (ExpRewmin(M,φ)).

The probability (expected reward) to satisfy path formula φ from state s ∈ S in MC D is Prob(D, φ)(s) (ExpRew(D, φ)(s)). The notation is analogous for maximising and minimising probability and expected reward measures in MDPs. Note that the expected reward ExpRew(D, φ) to satisfy path formula φ is only defined if Prob(D, φ) = 1. Accordingly, the expected reward for MDP M under scheduler <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup> requires Prob(Mσ, φ) = 1.

#### **3 Families of MCs**

We present our approaches on the basis of an explicit representation of a *family of MCs* using a parametric transition probability function. While arbitrary probabilistic programs allow for more modelling freedom and complex parameter structures, the explicit representation alleviates the presentation and allows to reason about practically interesting synthesis problems. In our implementation, we use a more flexible high-level modelling language, cf. Sect. 5.

**Definition 5 (Family of MCs).** *A* family of MCs *is defined as a tuple* D = (S, s0, K, P) *where* S *is a finite set of states,* s<sup>0</sup> ∈ S *is an initial state,* K *is a finite set of discrete parameters such that the domain of each parameter* k ∈ K *is* T<sup>k</sup> ⊆ S*, and* P: S → *Distr* (K) *is a family of transition probability matrices.*

The transition probability function of MCs maps states to distributions over successor states. For families of MCs, this function maps states to distributions over parameters. Instantiating each of these parameters with a value from its domain yields a "concrete" MC, called a *realisation*.

**Fig. 1.** The four different realisations of D.

**Definition 6 (Realisation).** *A* realisation *of a family* D = (S, s0, K, P) *is a function* r : K → S *where* ∀k ∈ K : r(k) ∈ Tk*. A realisation* r *yields a MC* D<sup>r</sup> = (S, s0, P(r))*, where* P(r) *is the transition probability matrix in which each* <sup>k</sup> <sup>∈</sup> <sup>K</sup> *in* <sup>P</sup> *is replaced by* <sup>r</sup>(k)*. Let* <sup>R</sup><sup>D</sup> *denote the* set of all realisations *for* <sup>D</sup>*.*

As a family D of MCs is defined over finite parameter domains, the number of family members (i.e. realisations from <sup>R</sup><sup>D</sup>) of <sup>D</sup> is finite, viz. <sup>|</sup>D<sup>|</sup> := |R<sup>D</sup> | = <sup>k</sup>∈<sup>K</sup> <sup>|</sup>Tk|, but exponential in <sup>|</sup>K|. Subsets of <sup>R</sup><sup>D</sup> induce so-called *subfamilies* of D. While all these MCs share the same state space, their *reachable* states may differ, as demonstrated by the following example.

*Example 1 (Family of MCs).* Consider a family of MCs D = (S, s0, K, P) where S = {0, 1, 2, 3}, s<sup>0</sup> = 0, and K = {k0, k1, k2} with domains T<sup>k</sup><sup>0</sup> = {0}, T<sup>k</sup><sup>1</sup> = {0, 1}, and T<sup>k</sup><sup>2</sup> = {2, 3}. The parametric transition function P is defined by:

$$\begin{aligned} \mathfrak{P}(0) &= 0.5 \colon k\_0 + 0.5 \colon k\_1 & \mathfrak{P}(1) = 0.5 \colon k\_1 + 0.5 \colon k\_2 \\ \mathfrak{P}(2) &= 1 \colon k\_2 & \mathfrak{P}(3) = 0.5 \colon k\_1 + 0.5 \colon k\_2 \end{aligned}$$

Figure 1 shows the four MCs that result from the realisations {r1, r2, r3, r4} = <sup>R</sup><sup>D</sup> of <sup>D</sup>. States that are unreachable from the initial state are greyed out.

We state two synthesis problems for families of MCs. The first is to identify the set of MCs satisfying and violating a given specification, respectively. The second is to find a MC that maximises/minimises a given objective. We call these two problems *threshold synthesis* and *max/min synthesis*.

**Problem 1 (Threshold synthesis).** *Let* D *be a family of MCs and* ϕ *a probabilistic reachability or expected reward specification. The* threshold synthesis problem *is to partition* <sup>R</sup><sup>D</sup> *into* <sup>T</sup> *and* <sup>F</sup> *such that* <sup>∀</sup><sup>r</sup> <sup>∈</sup> <sup>T</sup> : <sup>D</sup><sup>r</sup> <sup>ϕ</sup> *and* <sup>∀</sup><sup>r</sup> <sup>∈</sup> <sup>F</sup> : <sup>D</sup><sup>r</sup> <sup>ϕ</sup>*.*

As a special case of the threshold synthesis problem, the *feasibility synthesis problem* is to find just one realisation <sup>r</sup> ∈ R<sup>D</sup> such that <sup>D</sup><sup>r</sup> <sup>ϕ</sup>.

**Problem 2 (Max synthesis).** *Let* D *a family of MCs and* φ = ♦G *for* <sup>G</sup> <sup>⊆</sup> <sup>S</sup>*. The* max synthesis problem *is to find a realisation* <sup>r</sup><sup>∗</sup> ∈ R<sup>D</sup> *such that* Prob(Dr<sup>∗</sup> , φ) = maxr∈R<sup>D</sup>{Prob(Dr, φ)}*. The problem is defined analogously for an expected reward measure or minimising realisations.*

*Example 2 (Synthesis problems).* Recall the family of MCs D from Example 1. For the specification <sup>ϕ</sup> <sup>=</sup> <sup>P</sup>≥0.<sup>1</sup>(♦{1}), the solution to the threshold synthesis problem is T = {r2, r3} and F = {r1, r4}, as the goal state 1 is not reachable for D<sup>r</sup><sup>1</sup> and D<sup>r</sup><sup>4</sup> . For φ = ♦{1}, the solution to the max synthesis problem on D is r<sup>2</sup> or r3, as D<sup>r</sup><sup>2</sup> and D<sup>r</sup><sup>3</sup> have probability one to reach state 1.

**Approach 1 (One-by-one** [18]**).** *A straightforward solution to both synthesis problems is to enumerate all realisations* <sup>r</sup> ∈ R<sup>D</sup>*, model check the MCs* <sup>D</sup>r*, and either compare all results with the given threshold or determine the maximum.*

We already saw that the number of realisations is exponential in |K|.

**Theorem 1.** *The feasibility synthesis problem is NP-complete.*

The theorem even holds for almost-sure reachability properties. The proof is a straightforward adaption of results for augmented interval Markov chains [17, Theorem 3], partial information games [15], or partially observable MDPs [14].

### **4 Guided Abstraction-Refinement Scheme**

In the previous section, we introduced the notion of a family of MCs, two synthesis problems and the one-by-one approach. Yet, for a sufficiently high number of realisations such a straightforward analysis is not feasible. We propose a novel approach allowing us to more efficiently analyse families of MCs.

### **4.1 All-in-one MDP**

We first consider a single MDP that subsumes all individual MCs of a family D, and is equipped with an appropriate action and state labelling to identify the underlying realisations from <sup>R</sup><sup>D</sup>.

**Definition 7 (All-in-one MDP** [18,28,43]**).** *The* all-in-one MDP *of a family* D = (S, s0, K, P) *of MCs is given as* M<sup>D</sup> = (S<sup>D</sup>, s<sup>D</sup> <sup>0</sup> , *Act*<sup>D</sup>,P<sup>D</sup>) *where* <sup>S</sup><sup>D</sup> <sup>=</sup> <sup>S</sup> × R<sup>D</sup> ∪ {s<sup>D</sup> <sup>0</sup> }*, Act*<sup>D</sup> <sup>=</sup> {a<sup>r</sup> <sup>|</sup> <sup>r</sup> ∈ R<sup>D</sup>}*, and* <sup>P</sup><sup>D</sup> *is defined as follows:*

$$\mathcal{P}^{\mathfrak{D}}(s\_0^{\mathfrak{D}}, a^r)((s\_0, r)) = 1 \quad \text{and} \quad \mathcal{P}^{\mathfrak{D}}((s, r), a^r)((s', r)) = \mathfrak{P}(r)(s)(s').$$

*Example 3 (All-in-one MDP).* Figure 2 shows the all-in-one MDP M<sup>D</sup> for the family D of MCs from Example 1. Again, states that are not reachable from the initial state s<sup>D</sup> <sup>0</sup> are marked grey. For the sake of readability, we only include the transitions and states that correspond to realisations r<sup>1</sup> and r2.

**Fig. 2.** Reachable fragment of the all-in-one MDP M<sup>D</sup> for realisations r<sup>1</sup> and r2.

From the (fresh) initial state s<sup>D</sup> <sup>0</sup> of the MDP, the choice of an action a<sup>r</sup> corresponds to choosing the realisation r and entering the concrete MC Dr. This property of the all-in-one MDP is formalised as follows.

**Corollary 1.** *For the all-in-one MDP* M<sup>D</sup> *of family* D *of MCs*2*:*

{M<sup>D</sup> <sup>σ</sup>*<sup>r</sup>* <sup>|</sup> <sup>σ</sup><sup>r</sup> *memoryless deterministic scheduler*} <sup>=</sup> {D<sup>r</sup> <sup>|</sup> <sup>r</sup> ∈ R<sup>D</sup>}.

Consequently, the feasibility synthesis problem for <sup>ϕ</sup> has the solution <sup>r</sup> ∈ R<sup>D</sup> iff there exists a memoryless deterministic scheduler σ<sup>r</sup> such that M<sup>D</sup> <sup>σ</sup>*<sup>r</sup>* ϕ.

**Approach 2 (All-in-one** [18]**).** *Model checking the all-in-one MDP determines max or min probability (or expected reward) for all states, and thereby for all realisations, and thus provides a solution to both synthesis problems.*

As also the all-in-one MDP may be too large for realistic problems, we merely use it as formal starting point for our abstraction-refinement loop.

#### **4.2 Abstraction**

First, we define a predicate abstraction that at each state of the MDP *forgets* in which realisation we are, i.e., abstracts the second component of a state (s, r).

**Definition 8 (Forgetting).** *Let* M<sup>D</sup> = (S<sup>D</sup>, s<sup>D</sup> <sup>0</sup> , *Act*<sup>D</sup>,P<sup>D</sup>) *be an all-in-one MDP.* Forgetting *is an equivalence relation* <sup>∼</sup><sup>f</sup> <sup>⊆</sup> <sup>S</sup><sup>D</sup> <sup>×</sup> <sup>S</sup><sup>D</sup> *satisfying*

$$(s, r) \sim\_f (s', r') \iff s = s' \text{ and } s\_0^{\mathfrak{D}} \sim\_f (s\_0^{\mathfrak{D}}, r) \; \forall r \in \mathbb{R}^{\mathfrak{D}}.$$

*Let* [s]<sup>∼</sup> *denote the equivalence class wrt.* <sup>∼</sup><sup>f</sup> *containing state* <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>D</sup>*.*

*Forgetting induces the* quotient MDP M<sup>D</sup> <sup>∼</sup> = (S<sup>D</sup> <sup>∼</sup> , [s<sup>D</sup> <sup>0</sup> ]∼, *Act*<sup>D</sup>,P<sup>D</sup> <sup>∼</sup> )*, where* <sup>P</sup><sup>D</sup> <sup>∼</sup> ([s]∼, ar)([s ]∼) = P(r)(s)(s )*.*

At each state of the quotient MDP, the actions correspond to any realisation. It includes states that are unreachable in every realisation.

*Remark 1 (Action space).* According to Definition 8, for every state [s]<sup>∼</sup> there are |D| actions. Many of these actions lead to the same distributions over successor states. In particular, two different realisations r and r lead to the same distribution in s if r(k) = r (k) for all k ∈ K where P(s)(k) = 0. To avoid this spurious blow-up of actions, we *a-priori* merge all actions yielding the same distribution.

<sup>2</sup> The original initial state s<sup>0</sup> of the family of MCs needs to be the initial state of M<sup>D</sup> <sup>σ</sup>*<sup>r</sup>* .

**Fig. 3.** The quotient MDP M<sup>D</sup> <sup>∼</sup> for realisations r<sup>1</sup> and r2.

The quotient MDP under forgetting involves that the available actions allow to switch realisations and thereby create induced MCs different from any MC in D. We formalise the notion of a consistent realisation with respect to parameters.

**Definition 9 (Consistent realisation).** *For a family* D *of MCs and* k ∈ K*,* <sup>k</sup>-realisation-consistency *is an equivalence relation* <sup>≈</sup><sup>k</sup> ⊆ R<sup>D</sup>×R<sup>D</sup> *satisfying:*

$$r \approx\_k r' \iff r(k) = r'(k).$$

*Let* [r]<sup>≈</sup>*<sup>k</sup> denote the equivalence class w.r.t.* <sup>≈</sup><sup>k</sup> *containing* <sup>r</sup> ∈ R<sup>D</sup>*.*

**Definition 10 (Consistent scheduler).** *For quotient MDP* M<sup>D</sup> <sup>∼</sup> *after forgetting and* <sup>k</sup> <sup>∈</sup> <sup>K</sup>*, a scheduler* <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup><sup>D</sup> <sup>∼</sup> *is* <sup>k</sup>-consistent *if for all* π, π <sup>∈</sup> Paths<sup>M</sup><sup>D</sup> ∼ *fin :*

$$
\sigma(\pi) = a\_r \land \sigma(\pi') = a\_{r'} \implies r \approx\_k r' \; .
$$

*A scheduler is* K-consistent *(short:* consistent*) if it is* k*-consistent for all* k ∈ K*.*

**Lemma 1.** *For the quotient MDP* M<sup>D</sup> <sup>∼</sup> *of family* D *of MCs:*

$$\{ \left( M\_{\sim}^{\mathfrak{D}} \right)\_{\sigma^{r^\*}} \mid \sigma^{r^\*} \text{ consistent schedule} \} = \{ D\_r \mid r \in \mathcal{R}^{\mathfrak{D}} \}.$$

*Proof (Idea).* For <sup>σ</sup><sup>r</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup><sup>D</sup> , we construct σ<sup>r</sup><sup>∗</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup><sup>D</sup> <sup>∼</sup> such that σ<sup>r</sup><sup>∗</sup> ([s]∼) = a<sup>r</sup> for all s. Clearly σ<sup>r</sup><sup>∗</sup> is consistent and M<sup>D</sup> <sup>σ</sup>*<sup>r</sup>* = M<sup>D</sup> ∼ <sup>σ</sup>*r*<sup>∗</sup> is obtained via a map between (s, r) and [s]∼. For σ<sup>r</sup><sup>∗</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup><sup>D</sup> <sup>∼</sup> , we construct <sup>σ</sup><sup>r</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup><sup>D</sup> such that if σ<sup>r</sup><sup>∗</sup> ([s]∼) = a<sup>r</sup> then σ<sup>r</sup>(s<sup>D</sup> <sup>0</sup> ) = ar. For all other states, we define σ<sup>r</sup>((s, r )) = a<sup>r</sup>- independently of σ<sup>r</sup><sup>∗</sup> . Then M<sup>D</sup> <sup>σ</sup>*<sup>r</sup>* = M<sup>D</sup> ∼ <sup>σ</sup>*r*<sup>∗</sup> is obtained as above.

The following theorem is a direct corollary: we need to consider exactly the consistent schedulers.

**Theorem 2.** *For all-in-one MDP* M<sup>D</sup> *and specification* ϕ*, there exists a memoryless deterministic scheduler* <sup>σ</sup><sup>r</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup><sup>D</sup> *such that* M<sup>D</sup> <sup>σ</sup>*<sup>r</sup>* ϕ *iff there exists a consistent deterministic scheduler* σ<sup>r</sup><sup>∗</sup> <sup>∈</sup> <sup>Σ</sup><sup>M</sup><sup>D</sup> <sup>∼</sup> *such that* M<sup>D</sup> ∼ <sup>σ</sup>*r*<sup>∗</sup> ϕ*.*

*Example 4.* Recall the all-in-one MDP M<sup>D</sup> from Example 3. The quotient MDP M<sup>D</sup> <sup>∼</sup> is depicted in Fig. 3. Only the transitions according to realisations r<sup>1</sup> and r<sup>2</sup> are included. Transitions from previously unreachable states, marked grey in Example 3, are now available due to the abstraction. The scheduler <sup>σ</sup> <sup>∈</sup> <sup>Σ</sup>M<sup>D</sup> ∼ with σ([s<sup>D</sup> <sup>0</sup> ]∼) = ar<sup>2</sup> and σ([1]∼) = ar<sup>1</sup> is *not* k1*-consistent* as different values are chosen for k<sup>1</sup> by r<sup>1</sup> and r2. In the MC M<sup>D</sup> <sup>∼</sup><sup>σ</sup> induced by <sup>σ</sup> and <sup>M</sup><sup>D</sup> <sup>∼</sup> , the probability to reach state [2]<sup>∼</sup> is one, while under realisation r1, state 2 is not reachable.

**Approach 3 (Scheduler iteration).** *Enumerating all consistent schedulers for* M<sup>D</sup> <sup>∼</sup> *and analysing the induced MC provides a solution to both synthesis problems.*

However, optimising over exponentially many consistent schedulers solves the NP-complete feasibility synthesis problem, rendering such an iterative approach unlikely to be efficient. Another natural approach is to employ solving techniques for NP-complete problems, like satisfiability modulo linear real arithmetic.

**Approach 4 (SMT).** *A dedicated SMT-encoding (in [11]) of the induced MCs of consistent schedulers from* M<sup>D</sup> <sup>∼</sup> *that solves the feasibility problem.*

### **4.3 Refinement Loop**

Although iterating over consistent schedulers (Approach 3) is not feasible, model checking of M<sup>D</sup> <sup>∼</sup> still provides useful information for the analysis of the family D. Recall the feasibility synthesis problem for <sup>ϕ</sup> <sup>=</sup> <sup>P</sup>≤<sup>λ</sup>(φ). If Probmax(M<sup>D</sup> <sup>∼</sup> , φ) ≤ λ, then all realisations of D satisfy ϕ. On the other hand, Probmin(M<sup>D</sup> <sup>∼</sup> , φ) > λ implies that there is no realisation satisfying ϕ. If λ lies between the min and max probability, and the scheduler inducing the min probability is not consistent, we cannot conclude anything yet, i.e., the abstraction is too coarse. A natural countermeasure is to refine the abstraction represented by M<sup>D</sup> <sup>∼</sup> , in particular, split the set of realisations leading to two synthesis sub-problems.

**Definition 11 (Splitting).** *Let* <sup>D</sup> *be a family of MCs, and* R⊆R<sup>D</sup> *a set of realisations. For* k ∈ K *and predicate* A<sup>k</sup> *over* S*,* splitting *partitions* R *into*

> R = {r ∈R| Ak(r(k))} *and* R<sup>⊥</sup> = {r ∈R|¬Ak(r(k))}.

Splitting the set of realisations, and considering the subfamilies separately, rather than splitting states in the quotient MDP, is crucial for the performance of the synthesis process as we avoid rebuilding the quotient MDP in each iteration. Instead, we only restrict the actions of the MDP to the particular subfamily.

**Definition 12 (Restricting).** *Let* M<sup>D</sup> <sup>∼</sup> = (S<sup>D</sup> <sup>∼</sup> , [s<sup>D</sup> <sup>0</sup> ]∼, *Act*<sup>D</sup>,P<sup>D</sup> <sup>∼</sup> ) *be a quotient MDP and* R⊆R<sup>D</sup> *a set of realisations. The* restriction *of* <sup>M</sup><sup>D</sup> <sup>∼</sup> *wrt.* R *is the MDP* M<sup>D</sup> <sup>∼</sup> [R]=(S<sup>D</sup> <sup>∼</sup> , [s<sup>D</sup> <sup>0</sup> ]∼, *Act*<sup>D</sup>[R],P<sup>D</sup> <sup>∼</sup> ) *where Act*<sup>D</sup>[R] = {a<sup>r</sup> <sup>|</sup> <sup>r</sup> ∈ R}. 3

<sup>3</sup> Naturally, <sup>P</sup><sup>D</sup> <sup>∼</sup> in <sup>M</sup><sup>D</sup> <sup>∼</sup> [R] is restricted to *Act*<sup>D</sup>[R].


The splitting operation is the core of the proposed abstraction-refinement. Due to space constraints, we do not consider feasibility separately.

Algorithm 1 illustrates the *threshold synthesis* process. Recall that the goal is to decompose the set <sup>R</sup><sup>D</sup> into realisations satisfying and violating a given specification, respectively. The algorithm uses a set <sup>U</sup> to store subfamilies of <sup>R</sup><sup>D</sup> that have not been yet classified as satisfying or violating. It starts building the quotient MDP with merged actions. That is, we never construct the all-in-one MDP, and we merge actions as discussed in Remark 1. For every R ∈ U, the algorithm restricts the set of realisations to obtain the corresponding subfamily. For the restricted quotient MDP, the algorithm runs standard MDP model checking to compute the max and min probability and corresponding schedulers, respectively. Then, the algorithm either classifies R as satisfying/violating, or splits it based on a suitable predicate, and updates U accordingly. We describe the splitting strategy in the next subsection. The algorithm terminates if U is empty, i.e., all subfamilies have been classified. As only a finite number of subfamilies of realisations has to be evaluated, termination is guaranteed.

The refinement loop for max synthesis is very similar, cf. Algorithm 2. Recall that now the goal is to find the realisation r<sup>∗</sup> that maximises the satisfaction probability max<sup>∗</sup> of a path formula. The difference between the algorithms lies in the interpretation of the results of the underlying MDP model checking. If the max probability for R is below max∗, R can be discarded. Otherwise, we check whether the corresponding scheduler σmax is consistent. If consistent, the algorithm updates r<sup>∗</sup> and max∗, and discards R. If the scheduler is not consistent but min > max<sup>∗</sup> holds, we can still update max<sup>∗</sup> and improve the pruning process, as it means that some realisation (we do not know which) in R induces a higher probability than max∗. Regardless whether max<sup>∗</sup> has been updated, the algorithm has to split R based on some predicate, and analyse its subfamilies as they may include the maximising realisation.

#### **Algorithm 1.** Threshold synthesis

#### **Algorithm 2.** Max synthesis

**Input:** A family <sup>D</sup> of MCs with the set <sup>R</sup><sup>D</sup> of realisations, and a path formula <sup>φ</sup> **Output:** A realisation <sup>r</sup><sup>∗</sup> ∈ R<sup>D</sup> according to Problem 2. 1: max<sup>∗</sup> ← −∞, U ← {R<sup>D</sup>} 2: M<sup>D</sup> <sup>∼</sup> <sup>←</sup> buildQuotientMDP(D, <sup>R</sup><sup>D</sup>, <sup>∼</sup><sup>f</sup> ) Applying Def. 7 and 8 3: **while** U = ∅ **do** 4: **select** R ∈ U **and** U ← U \ {R} 5: M<sup>D</sup> <sup>∼</sup> [R] <sup>←</sup> restrict(M<sup>D</sup> <sup>∼</sup> , R) Applying Def. 12 6: (max, σmax) <sup>←</sup> solveMaxMDP(M<sup>D</sup> <sup>∼</sup> [R], φ) 7: (min, σmin) <sup>←</sup> solveMinMDP(M<sup>D</sup> <sup>∼</sup> [R], φ) 8: **if** max > max<sup>∗</sup> **then** 9: **if** isConsistent(σmax) **then** r<sup>∗</sup> ← qmax, max<sup>∗</sup> ← max 10: **else** 11: **if** min > max<sup>∗</sup> **then** max<sup>∗</sup> ← min 12: U ← U ∪ split(R, selPredicate(max, σmax, min, σmin)) See Sect. 4.4 13: **return** r<sup>∗</sup>

#### **4.4 Splitting Strategies**

If verifying the quotient MDP M<sup>D</sup> <sup>∼</sup> [R] cannot classify the (sub-)realisation R as satisfying or violating, we split R, while we guide the splitting strategy by using the obtained verification results. The splitting operation chooses a suitable parameter k ∈ K and predicate A<sup>k</sup> that partition the realisations R into R and R<sup>⊥</sup> (see Definition 11). A good splitting strategy globally reduces the number of model-checking calls required to classify all r ∈ R.

The two key aspects to locally determine a good k are: (1) the *variance*, that is, how the splitting may narrow the difference between max = Probmax(M<sup>D</sup> <sup>∼</sup> [<sup>X</sup> ], φ) and min = Probmin(M<sup>D</sup> <sup>∼</sup> [X ], φ) for both X = R or X = R⊥, and (2) the *consistency*, that is, how the splitting may reduce the inconsistency of the schedulers σmax and σmin. These aspects cannot be evaluated precisely without applying all the split operations and solving the new MDPs M<sup>D</sup> <sup>∼</sup> [R⊥] and <sup>M</sup><sup>D</sup> <sup>∼</sup> [R ]. Therefore, we propose an efficient strategy that selects k and A<sup>k</sup> based on a light-weighted analysis of the model-checking results for M<sup>D</sup> <sup>∼</sup> [R]. The strategy applies two *scores* variance(k) and consistency(k) that estimate the influence of k on the two key aspects. For any k, the scores are accumulated over all *important states* s (reachable via σmax or σmin, respectively) where <sup>P</sup>(s)(k) = 0. A state <sup>s</sup> is important for <sup>R</sup> and some <sup>δ</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup> if

$$\frac{\mathsf{Prob}^{\max}(M^{\mathfrak{D}}\_{\sim}[\mathcal{R}],\phi)(s) - \mathsf{Prob}^{\min}(M^{\mathfrak{D}}\_{\sim}[\mathcal{R}],\phi)(s)}{\mathsf{Prob}^{\max}(M^{\mathfrak{D}}\_{\sim}[\mathcal{R}],\phi) - \mathsf{Prob}^{\min}(M^{\mathfrak{D}}\_{\sim}[\mathcal{R}],\phi)} \geq \delta$$

where Probmin(.)(s) and Probmax(.)(s) is the min and max probability in the MDP with initial state s. To reduce the overhead of computing the scores, we simplify the scheduler representation. In particular, for σmax and every k ∈ K, we extract a map C<sup>k</sup> max : <sup>T</sup><sup>k</sup> <sup>→</sup> <sup>N</sup>, where <sup>C</sup><sup>k</sup> max(t) is the number of important states for which σmax(s) = a<sup>r</sup> with r(k) = t. The mapping C<sup>k</sup> min represents σmin.

We define variance(k) = - <sup>t</sup>∈T*<sup>k</sup>* <sup>|</sup>C<sup>k</sup> max(t)−C<sup>k</sup> min(t)|, leading to high scores if the two schedulers vary a lot. Further, we define consistency(k) = size C<sup>k</sup> max · max C<sup>k</sup> max +size C<sup>k</sup> min ·max C<sup>k</sup> min , where size (C) = |{t ∈ T<sup>k</sup> | C(t) > 0}|−1 and max (C) = maxt∈T*<sup>k</sup>* {C(t)}, leading to high scores if the parameter has clear favourites for σmax and σmin, but values from its full range are chosen.

As indicated, we consider different strategies for the two synthesis problems. For threshold synthesis, we favour the impact on the variance as we principally do not need consistent schedulers. For the max synthesis, we favour the impact on the consistency, as we need a consistent scheduler inducing the max probability.

Predicate A<sup>k</sup> is based on reducing the variance: The strategy selects T ⊂ T<sup>k</sup> with |T <sup>|</sup> <sup>=</sup> <sup>1</sup> <sup>2</sup> |Tk|, containing those <sup>t</sup> for which <sup>C</sup><sup>k</sup> max(t)−C<sup>k</sup> min(t) is the largest. The goal is to get a set of realisations that induce a large probability (the ones including T for parameter k) and the complement inducing a small probability.

**Approach 5 (MDP-based abstraction refinement).** *The methods underlying Algorithms 1 and 2, together with the splitting strategies, provide solutions to the synthesis problems and are referred to as* MDP abstraction *methods.*

### **5 Experiments**

We implemented the proposed synthesis methods as a Python prototype using Storm [25]. In particular, we use the Storm Python API for model-adaption, -building, and -checking as well as for scheduler extraction. For SMT solving, we use Z3 [39] via pySMT [26]. The tool-chain takes a PRISM [38] or JANI [8] model with open integer constants, together with a set of expressions with possible values for these constants. The model may include the parallel composition of several modules/automata. The open constants may occur in guards<sup>4</sup>, probability definitions, and updates of the commands/edges. Via adequate annotations, we identify the parameter values that yield a particular action. The annotations are key to interpret the schedulers, and to restrict the quotient without rebuilding.

All experiments were executed on a Macbook MF839LL/A with 8 GB RAM memory limit and a 12 h time out. All algorithms can significantly benefit from coarse-grained parallelisation, which we therefore do not consider here.

#### **5.1 Research Questions and Benchmarks**

The goal of the experimental evaluation is to answer the research question: *How does the proposed MDP-based abstraction methods (Approaches* 3–5*) cope with the inherent complexity (i.e. the NP-hardness) of the synthesis problems (cf. Problems* 1 *and* 2*)?* To answer this question, we compare their performance with Approaches 1 and 2 [18], representing state-of-the-art solutions and the base-line algorithms. The experiments show that the performance of the

<sup>4</sup> Slight care by the user is necessary to avoid deadlocks.


**Table 1.** Benchmarks and timings for Approaches 1–3

MDP abstraction significantly varies for different case studies. Thus, we consider benchmarks from various application domains to *identify the key characteristics of the synthesis problems affecting the performance of our approach.*

*Benchmarks description.* We consider the following case studies: *Maze* is a planning problem typically considered as POMDP, e.g. in [41]. The family describes all MCs induced by small-memory [14,35] observation-based deterministic strategies (with a fixed upper bound on the memory). We are interested in the expected time to the goal. In [35], parameter synthesis was used to find randomised strategies, using [22]. *Pole* considers balancing a pole in a noisy and unknown environment (motivated by [2,12]). At deploy time, the controller has a prior over a finite set of environment behaviours, and should optimise the expected behavior without depending on the actual (hidden) environment. The family describes schedulers that do not depend on the hidden information. We are interested in the expected time until failure. *Herman* is an asynchronous encoding of the distributed Herman protocol for self-stabilising rings [33,37]. The protocol is extended with a bit of memory for each station in the ring, and the choice to flip various unfair coins. Nodes in the ring are anonymous, they all behave equivalently (but may change their local memory based on local events). The family describes variations of memory-updates and coin-selection, but preserves anonymity. We are interested in the expected time until stabilisation. *DPM* considers a partial information scheduler for a disk power manager motivated by [7,27]. We are interested in the expected energy consumption. *BSN* (Body sensor network, [43]) describes a network of connected sensors that identify health-critical situations. We are interested in the reliability. The family contains various configurations of the used sensors. *BSN* is the largest software product line benchmark used in [18]. We drop some implications between features (parameters for us) as this is not yet supported by our modelling language. We thereby extended the family.

Table 1 shows the relevant statistics for each benchmark: the benchmark name, the (approximate) range of the min and max probability/reward for the given family, the number of non-singleton parameters |K|, and the number of family members |D|. Then, for the family members the average number of states and transitions of the MCs, and the states, actions (= - <sup>s</sup>∈<sup>S</sup> <sup>|</sup>*Act*(s)|), and transitions of the quotient MDP. Finally, it lists in seconds the run time of the base-line


**Table 2.** Results for threshold synthesis via abstraction-refinement

algorithms and the consistent scheduler enumeration<sup>5</sup>. The base-line algorithms employ the one-by-one and the all-in-one technique, using either a BDD or a sparse matrix representation. We report the best results. MOs indicate breaking the memory limit. Only the all-in-one approach required significant memory. As expected, the SMT-based implementation provides an inferior performance and thus we do not report its results.

#### **5.2 Results and Discussion**

To simplify the presentation, we focus primarily on the threshold synthesis problem as it allows a compact presentation of the key aspects. Below, we provide some remarks about the performance for the max and feasibility synthesis.

*Results.* Table 2 shows results for threshold synthesis. The first two columns indicate the benchmark and the various thresholds. For each threshold λ, the table lists the number of family members below (above) λ, each with the number of subfamilies that together contain these instances, and the number of singleton subfamilies that were considered. The last table part gives the number of iterations of the loop in Algorithm 1, and timing information (total, build/restrict times, model checking times, scheduler analysis times). The last column gives the speed-up over the best base-line (based on the estimates).

*Key observations.* The speed-ups drastically vary, which shows that the MDP abstraction often achieves a superior performance but may also lead to a performance degradation in some cases. We identify four key factors.

<sup>5</sup> Values with a <sup>∗</sup> are estimated by sampling a large fraction of the family.

**Iterations.** As typical for CEGAR approaches, the key characteristic of the benchmark that affects the performance is the number N of iterations in the refinement loop. The abstract action introduces an overhead per iteration caused by performing two MDP verification calls and by the scheduler analysis. The run time for *BSN*, with a small |D| is actually significantly affected by the initialisation of various data structures; thus only a small speedup is achieved.

**Abstraction size.** The size of the quotient, compared to the average size of the family members, is relevant too. The quotient includes at least all reachable states of all family members, and may be significantly larger if an inconsistent scheduler reaches states which are unreachable under any consistent scheduler. The existence of such states is a common artefact from encoding families in high-level languages. Table 1, however, indicates that we obtain a very compact representation for *Maze* and *Pole*.

**Thresholds.** The most important aspect is the threshold λ. If λ is closer to the optima, the abstraction requires a smaller number of iterations, which directly improves the performance. We emphasise that in various domains, thresholds that ask for close-to-optimal solutions are indeed of highest relevance as they typically represent the system designs developers are most interested in [44]. *Why do thresholds affect the number of iterations?* Consider a family with T<sup>k</sup> = {0, 1} for each <sup>k</sup>. Geometrically, the set <sup>R</sup><sup>D</sup> can be visualised as <sup>|</sup>K|-dimensional cube. The cube-vertices reflect family members. Assume for simplicity that one of these vertices is optimal with respect to the specification. Especially in benchmarks where parameters are equally important, the induced probability of a vertex roughly corresponds to the Manhattan distance to the optimal vertex. Thus, vertices above the threshold induce a diagonal hyperplane, which our splitting method approximates with orthogonal splits. Splitting diagonally is not possible, as it would induce optimising over observation-based schedulers. Consequently, we need more and more splits the more the diagonal goes through the middle of the cube. *Even when splitting optimally, there is a combinatorial blow-up in the required splits when the threshold is further from the optimal values.* Another effect is that thresholds far from optima are more affected by the over-approximation of the MDP model-checking results and thus yield more inconclusive answers.

**Refinement strategy.** So far, we reasoned about optimal splits. Due to the computational overhead, our strategy cannot ensure optimal splits. Instead, the strategy depends mostly on information encoded in the computed MDP strategies. *In models where the optimal parameter value heavily depends on the state, the obtained schedulers are highly inconsistent and carry only limited information for splitting.* Consequently, in such benchmarks we split sub-optimally. The suboptimality has a major impact on the performance for *Herman* as all obtained strategies are highly inconsistent – they take a different coin for each node, which is good to speed up the stabilisation of the ring.

*Summary.* MDP abstraction is not a silver bullet. It has a lot of potential in threshold synthesis when the threshold is close to the optima. Consequently, *feasibility synthesis with unsatisfiable specifications is handled perfectly well by MDP abstraction*, while this is the worst-case for enumeration-based approaches. Likewise, *max synthesis* can be understood as threshold synthesis with a shifting threshold max∗: If the max<sup>∗</sup> is quickly set close to max, MDP abstraction yields superior performance. Roughly, we can quickly approximate max∗ when some of the parameter values are clearly beneficial for the specification.

### **6 Conclusion and Future Work**

We contributed to the efficient analysis of families of Markov chains. In particular, we discussed and implemented existing approaches to solve practically interesting synthesis problems, and devised a novel abstraction refinement scheme that mitigates the computational complexity of the synthesis problems, as shown by the empirical evaluation. In the future, we will include refinement strategies based on counterexamples as in [23,34].

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Optimal Time-Bounded Reachability Analysis for Concurrent Systems**

Yuliya Butkova(B) and Gereon Fox

Saarland University, Saarbr¨ucken, Germany {butkova,fox}@depend.uni-saarland.de

**Abstract.** Efficient optimal scheduling for concurrent systems on a finite horizon is a challenging task up to date: Not only does time have a continuous domain, but in addition there are exponentially many possible decisions to choose from at every time point.

In this paper we present a solution to the problem of optimal timebounded reachability for Markov automata, one of the most general formalisms for modelling concurrent systems. Our algorithm is based on the discretisation of the time horizon. In contrast to most existing algorithms for similar problems, the discretisation step is not fixed. We attempt to discretise only in those time points when the optimal scheduler *in fact* changes its decision. Our empirical evaluation demonstrates that the algorithm improves on existing solutions up to several orders of magnitude.

### **1 Introduction**

Modern technologies grow and complexify rapidly, making it hard to ensure their dependability and reliability. Formal approaches to describing these systems include (generalised) stochastic Petri nets [Mol82,MCB84,MBC+98,Bal07], stochastic activity networks [MMS85], dynamic fault trees [BCS10] and others. The semantics of these modelling languages is often defined in terms of *continuous time Markov chains* (CTMCs). CTMCs can model the behaviour of seemingly independent processes evolving in memoryless continuous time (according to exponential distributions).

Modelling a system as a CTMC, however, strips it of any notion of *choice*, e. g., which of a number of requests to process first, or how to optimally balance the load over multiple servers of a cluster. Making sure that the system is safe for all possible choices of this kind is an important issue when assessing its reliability. *Non-determinism* allows the modeller to capture these choices. Modelling systems with non-determinism is possible in formalisms such as *interactive Markov chains* [Her02], or *Markov automata* (MA) [EHKZ13]. The latter are one

This work is supported by the ERC Advanced Investigators Grant 695614 (POWVER) and by the German Research Foundation (DFG) Grant 389792660, as part of CRC 248 (see https://perspicuous-computing.science).

c The Author(s) 2019

T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 191–208, 2019. https://doi.org/10.1007/978-3-030-17465-1\_11

of the most general models for concurrent systems available and can serve as a semantics for generalised stochastic Petri nets and dynamic fault trees.

A similar formalism, *continuous time Markov decision processes* (CTMDPs) [Ber00,Put94], has seen wide-spread use in control theory and operations research. In fact, MA and CTMDPs are closely related: They both can model exponential Markovian transitions and non-determinism. However, MA are *compositional*, while CTMDPs are not: In general it is not possible to model a system as a CTMDP by modelling each of its sub-components as smaller CTMDP and then combining them. This is why modelling large systems with many communicating sub-components as a CTMDP is cumbersome and error-prone. In fact, most modern model checkers, such as Storm [DJKV17], Modest [HH14] and PRISM [KNP11], do not offer any support for CTMDPs.

In the analysis of MA and CTMDPs, one of the most challenging problems is the approximation of *optimal time-bounded reachability probability*, i. e. the maximal (or minimal) probability of a system to reach a set of goal states (e. g. unsafe states) within a given time bound. Due to the presence of non-determinism this value depends on which decisions are taken at which time points. Since the optimal strategy is time dependent there are continuously many different strategies. Classically, one deals with continuity by discretising the values, as is the case in most algorithms for CTMDPs and MA [Neu10,FRSZ16,HH15,BS11]: The time horizon is discretised into finitely many intervals, and the value within each interval is approximated by e. g. polynomial or exponential functions.

Discretisation is closely related to the scheduler that is optimal for a specific MA. As an example, consider Fig. 1: The plot shows the probabilities of reaching a goal state for a certain time bound, by choosing options 1 and 2. If less than 0.9 seconds remain, option 1 has a higher probability of reaching the goal set, while option 2 is preferable as long as more than 0.9 seconds are left. In this example it is enough to discretise the time horizon with roughly 2 intervals: [0, 0.9] and (0.9, 1.5]. The algorithms known to date however use from 200 to 2·10<sup>6</sup> intervals, which is far too many. The solution that we present in this paper discretises the time horizon in only *three* intervals for this example.

**Fig. 1.** Reachability probability for different decisions

*Our contribution* consists in an algorithm that computes time bounded reachability probabilities for Markov automata. The algorithm discretises the time horizon by intervals of variable length, making them smaller near those time points where the optimal scheduler switches from one decision to another. We give a characterisation of these time points, as well as tight sufficient conditions for no such time point to exist within an interval. We present an empirical evaluation of the performance of the algorithm and compare it to other algorithms available for Markov automata. The algorithm does perform well in the comparison, improving in some cases by several orders of magnitude, but does not strictly outperform available solutions.

### **2 Preliminaries**

Given a finite set S, a *probability distribution* over S is a function μ : S → [0, 1], s. t. - <sup>s</sup>∈<sup>S</sup> <sup>μ</sup>(s) = 1. We denote the set of all probability distributions over <sup>S</sup> by Dist(S). The sets of rational, real and natural numbers are denoted with Q, R and N resp., X-<sup>0</sup> = {x ∈ X | x-<sup>0</sup>}, for <sup>X</sup> ∈ {Q, <sup>R</sup>}, -∈ {>, }, <sup>N</sup><sup>0</sup> <sup>=</sup> <sup>N</sup>∪{0}.

**Definition 1.** *<sup>A</sup>* Markov automaton (MA)<sup>1</sup> *is a tuple* <sup>M</sup> = (S, *Act*, **<sup>P</sup>**, <sup>Q</sup>, G) *where* S *is a finite set of states partitioned into* probabilistic *(PS ) and* Markovian *(MS ),* G ⊆ S *is a set of* goal states*, Act is a finite set of actions,* **<sup>P</sup>** : *PS* <sup>×</sup> *Act* <sup>→</sup> Dist(S) *is the probabilistic transition matrix,* Q : *MS* <sup>×</sup> <sup>S</sup> <sup>→</sup> <sup>Q</sup> *is the Markovian transition matrix, s. t.* Q(s, s ) 0 *for* s = s *,* Q(s, s) = −- s-=<sup>s</sup> Q(s, s )*.*

Figure 2 shows an example MA. Grey and white colours denote Markovian and probabilistic states correspondingly. Transitions labelled as α or β are actions of state s1. Dashed transitions associated with an action represent the distribution assigned to the action. Purely solid transitions are Markovian.

*Notation and further definitions:* For a Markovian state s ∈ *MS* and s = s, we call Q(s, s ) the *transition rate* from s to s . The *exit rate* of a Markovian state s is E(s) := - =<sup>s</sup> Q(s, s ). **E**max denotes the maximal exit

**Fig. 2.** An example MA.

srate among all the Markovian states of M. For a probabilistic state s ∈ *PS*, Act(s) = {α ∈ Act| ∃μ ∈ Dist(S) : **P**(s, α) = μ} denotes the set of actions that are *enabled* in <sup>s</sup>. <sup>P</sup>[s, α, ·] <sup>∈</sup> Dist(S) is defined by <sup>P</sup>[s, α, s ] := μ(s ), where **P**(s , α) = μ. We impose the usual *non-zenoness* [GHH+14] restriction on MA. This disallows e. g., probabilistic states with no outgoing transitions, or with only self-loop transitions.

A *(timed) path* in M is a finite or infinite sequence ρ = s<sup>0</sup> α0,t<sup>0</sup> −→ s<sup>1</sup> α1,t<sup>1</sup> −→ ··· <sup>α</sup>k,t<sup>k</sup> −→ sk+1 αk+1,tk+1 −→ · · · , where α<sup>i</sup> ∈ Act(si) for s<sup>i</sup> ∈ *PS*, and α<sup>i</sup> = ⊥ for s<sup>i</sup> ∈ *MS*. For a finite path ρ = s<sup>0</sup> α0,t<sup>0</sup> −→ s<sup>1</sup> α1,t<sup>1</sup> −→ · · · <sup>α</sup>k−1,tk−<sup>1</sup> −→ s<sup>k</sup> we define ρ↓ = sk. The set of all finite (infinite) paths of M is denoted by *Paths*<sup>∗</sup> (*Paths*).

Time passes continuously in Markovian states. The system leaves the state after the amount of time that is governed by an exponential distribution, i. e. the probability of leaving <sup>s</sup> <sup>∈</sup> *MS* within <sup>t</sup> <sup>≥</sup> 0 time units is given by 1−e−E(s)·<sup>t</sup> , after which the next state s is chosen with probability Q(s, s )/E(s).

Probabilistic transitions happen instantaneously. Whenever the system is in a probabilistic state s and an action α ∈ Act(s) is chosen, the successor s is

<sup>1</sup> Strictly speaking, this is the definition of a *closed* Markov automaton in which no state has two actions with the same label. This is however not a restriction since the analysis of *general* Markov automata is always performed only after the composition under the urgency assumption is performed. Additional renaming of the actions does not affect the properties considered in this work.

selected according to the distribution <sup>P</sup>[s, α, ·] and the system moves from <sup>s</sup> to s right away. Thus, the residence time in probabilistic states is always 0.

#### **2.1 Time-Bounded Reachability**

In this work we are interested in the probability to reach a certain set of states of a Markov automaton within a given time bound. However, due to the presence of multiple actions in probabilistic states the behaviour of a Markov automaton is not a stochastic process and thus no probability measure can be defined. This issue is resolved by introducing the notion of a scheduler.

A *general scheduler (or strategy)* π : *Paths*<sup>∗</sup> → Dist(Act) is a measurable function, s. t. ∀ρ ∈ *Paths*<sup>∗</sup> if ρ↓ ∈ *PS* then π(ρ) ∈ Dist(Act(ρ↓)). General schedulers provide a distribution over enabled actions of a probabilistic state given that the path ρ has been observed from the beginning of the system evolution. We call *stationary* such a general scheduler π that can be represented as π : *PS* → Act, i. e. it is non-randomised and depends only on the current state. The set of all general (stationary) schedulers is denoted by Πgen (Πstat resp.).

Given a general scheduler π, the behaviour of a Markov automaton is a fully defined stochastic process. For the definition of the probability measure Pr<sup>π</sup> <sup>M</sup> on Markov automata we refer to [Hat17].

Let <sup>s</sup> <sup>∈</sup> <sup>S</sup>, <sup>T</sup> <sup>∈</sup> <sup>Q</sup><sup>0</sup> be a time bound and <sup>π</sup> <sup>∈</sup> <sup>Π</sup>gen be a general scheduler. The *(time-bounded) reachability probability* (or *value*) for a scheduler π and state s in M is defined as follows:

$$\mathrm{val}\_s^{\mathcal{M},\pi}(T) := \mathrm{Pr}\_{\mathcal{M}}^{\pi} \left[ \Diamond\_s^{\leqslant T} G \right],$$

where ♦<sup>T</sup> <sup>s</sup> <sup>G</sup> <sup>=</sup> {<sup>s</sup> <sup>α</sup>0,t<sup>0</sup> −→ s<sup>1</sup> α1,t<sup>1</sup> −→ s<sup>2</sup> ... | ∃i : s<sup>i</sup> ∈ G ∧ i−1 <sup>j</sup>=0 t<sup>j</sup> ≤ T} is the set of paths starting from s and reaching G before T.

For opt ∈ {sup, inf}, the *optimal (time-bounded) reachability probability* (or *value*) of state s in M is defined as follows:

$$\operatorname{val}\_s^{\mathcal{M}}(T) := \operatorname{opt}\_{\pi \in \Pi\_{\mathfrak{g}^{\operatorname{an}}}} \operatorname{val}\_s^{\mathcal{M},\pi}(T)$$

We denote by valM,π(T) (valM(T)) the vector of values valM,π <sup>s</sup> (T) (val<sup>M</sup><sup>s</sup> (T)) for all s ∈ S. A general scheduler that achieves optimum for valM(T) is called *optimal*, and the one that achieves value *v*, s. t. ||*v* − valM(T)||<sup>∞</sup> < ε, is ε*-optimal*.

*Optimal Schedulers.* For the time-bounded reachability problem it is known [RS13] that there exists an optimal scheduler <sup>π</sup> of the form <sup>π</sup> : *PS* <sup>×</sup>R<sup>0</sup> <sup>→</sup> Act. This scheduler does not need to know the full history of the system, but only the current probabilistic state it is in and the total time left until time bound. It is deterministic, i. e. *not randomised*, and additionally, this scheduler is *piecewise constant*, meaning that there exists a finite partition I(π) of the time interval [0, T] into intervals I<sup>0</sup> = [t0, t1], I<sup>1</sup> = (t1, t2], ··· , I<sup>k</sup>−<sup>1</sup> = (t<sup>k</sup>−<sup>1</sup>, tk], such that t<sup>0</sup> = 0, t<sup>k</sup> = T and the value of the scheduler remains constant throughout each interval of the partition, i. e. ∀I ∈ I(π), ∀t1, t<sup>2</sup> ∈ I, ∀s ∈ *PS* : π(s, t1) = π(s, t2). The value of π on an interval I ∈ I(π) and s ∈ *PS* is denoted by π(s, I), i. e. π(s, I) = π(s, t) for any t ∈ I.

As an example, consider the MA in Fig. 2 and time bound T = 1. Here the optimal scheduler for state s<sup>1</sup> chooses the reliable but slow action β if there is enough time, i. e. if at least 0.62 time is left. Otherwise the optimal scheduler switches to a more risky, but faster path via action α.

In the literature this subclass of schedulers is sometimes referred to as *totaltime positional deterministic, piecewise constant schedulers*. From now on we call a scheduler from this subclass simply a *scheduler (or strategy)* and denote the set of such schedulers with Π. An important notion of schedulers is the *switching point*, the point of time separating two intervals of constant decisions:

**Definition 2.** *For a scheduler* <sup>π</sup> *and* <sup>s</sup> <sup>∈</sup> *PS we call* <sup>τ</sup> <sup>∈</sup> <sup>R</sup><sup>0</sup> *<sup>a</sup>* switching point*, iff* ∃I1, I<sup>2</sup> ∈ I(π)*, s. t.* τ = sup I<sup>1</sup> *and* τ = inf I<sup>2</sup> *and* ∃s ∈ *PS* : π(s, I1) = π(s, I2)*.*

Whether the switching points can be computed exactly or not is an open problem. In fact, the theorem of Lindemann-Weierstrass suggests that switching points are non-algebraic numbers, what hints at a negative answer.

#### **3 Related Work**

In this section we briefly review the algorithms designed to approximate time bounded reachability probabilities. We only discuss the algorithms that guarantee to compute ε-close approximation of the reachability value.

The majority of the algorithms [Neu10,BS11,FRSZ16,SSM18,BHHK15] are available for continuous time Markov decision processes (CTMDPs) [Ber00]. Two of those, [Neu10] and [BHHK15], are also applicable to MA. We compare to them in our empirical evaluation in Sect. 5. All the algorithms utilise such known techniques as discretisation, uniformisation, or a combination thereof. The drawback of most of the algorithms is that they do not adapt to a specific instance of a problem. Namely, given a model M to analyse, they perform as many computations as is needed for <sup>M</sup>, which is the worst-case model in a subclass of models that share certain parameters with M, such as **E**max, for example. Experimental evaluation performed in [BHHK15] shows that such approaches are not promising, because most of the time the algorithms perform too many unnecessary computations. This is not the case for [BS11] and [BHHK15]. The latter performs the analysis via uniformisation and schedulers that cannot observe time. The former, designed for CTMDPs, performs discretisation of the time horizon with intervals of variable length, however is not applicable to MA. Just like in [BS11], our approach is to adapt the discretisation of the time horizon to a specific instance of the problem.

### **4 Our Solution**

In this section we present a novel approach to approximating optimal timebounded reachability and the optimal scheduler for an arbitrary Markov automaton. Throughout the section we work with an MA M = (S, Act, **P**, Q, G), time bound <sup>T</sup> <sup>∈</sup> <sup>Q</sup><sup>0</sup> and precision <sup>ε</sup> <sup>∈</sup> <sup>Q</sup>><sup>0</sup>. To simplify the presentation we concentrate on supremum reachability probability.

Given a scheduler, computation (or approximation) of the reachability probability is relatively easy:

**Lemma 1.** *For a scheduler* <sup>π</sup> <sup>∈</sup> <sup>Π</sup> *and a state* <sup>s</sup> <sup>∈</sup> <sup>S</sup>*, the function* valM,π <sup>s</sup> : [0, T] → [0, 1] *is the solution to the following system of equations:*

$$\begin{aligned} f\_s(t) &= 1 & \text{if } s \in G\\ -\frac{\text{d}f\_s(t)}{\text{d}t} &= \sum\_{s' \in S} \text{Q}(s, s') \cdot f\_{s'}(t) & \text{else if } s \in MS\\ f\_s(t) &= \sum\_{s' \in S} \mathbb{P}[s, \pi(s, t), s'] \cdot f\_{s'}(t) & \text{else if } s \in PS \end{aligned} \tag{1}$$
 
$$f\_s(0) = \begin{cases} 1 & \text{if } s \in G\\ \sum\_{s' \in S} \mathbb{P}[s, \pi(s, 0), s'] \cdot f\_{s'}(0) & \text{else if } s \in PS\\ 0 & \text{otherwise} \end{cases} \tag{2}$$

Let 0 = τ<sup>0</sup> < τ<sup>1</sup> < ... < τ<sup>k</sup>−<sup>1</sup> < τ<sup>k</sup> = T, where τ<sup>i</sup> are the switching points of π for i = 1..k − 1. The solution of the system of Equations (1)–(2) can be obtained separately on each of the intervals (τ<sup>i</sup>−1, τi], ∀i = 1..k, where the value of the scheduler remains constant for all states. Given the solution valM,π <sup>s</sup> (t) on interval (τ<sup>i</sup>−1, τi], we derive the solution for (τi, τi+1] by using the values valM,π <sup>s</sup> (τi) as boundary conditions. Later in Sect. 4.1 we will show that the approximation of the solution for each interval (τ<sup>i</sup>−<sup>1</sup>, τi] can be achieved via a combination of known techniques, such as *uniformisation* (for the Markovian states) and *untimed reachability analysis* (for probabilistic states).

Thus, given an optimal scheduler, Lemma 1 can be used to compute or approximate the optimal reachability value. Finding an optimal scheduler is therefore *the* challenge for optimal time-bounded reachability analysis. Our solution is based on approximating the optimal reachability value up to an arbitrary ε > 0 by discretising the time horizon with intervals of variable length. On each interval the value of our ε-optimal scheduler remains constant. The discretisation we use attempts to reflect the partition <sup>I</sup>(π) of a minimal<sup>2</sup> optimal scheduler <sup>π</sup>, i. e. it mimics intervals on which π has constant value.

Our solution is presented in Algorithm 1. It computes an ε-optimal scheduler πopt and approximates the system of Equations (1)–(2) for πopt. The algorithm iterates over intervals of constant decisions of an ε-optimal strategy. At each

<sup>2</sup> In the size of <sup>I</sup>(π).

iteration it computes: (i) a stationary scheduler π that is close to be optimal on the current interval (line 7), (ii) length δ of the interval, on which π introduces acceptable error (line 8) and (iii) the reachability values for time t + δ (line 9). The following sections discuss the steps of the algorithm in more detail.

**Theorem 1.** *Algorithm 1 approximates the value of an arbitrary Markov automaton for time bound* <sup>T</sup> <sup>∈</sup> <sup>Q</sup><sup>0</sup> *up to a given* <sup>ε</sup> <sup>∈</sup> <sup>Q</sup>><sup>0</sup>*.*

#### **Algorithm 1.** SwitchStep

**Input:** MA <sup>M</sup> = (S, Act, **<sup>P</sup>**, <sup>Q</sup>, G), time bound <sup>T</sup> <sup>∈</sup> <sup>Q</sup><sup>0</sup>, precision <sup>ε</sup> <sup>∈</sup> <sup>Q</sup>><sup>0</sup> **Output:** *<sup>u</sup>*(T) <sup>∈</sup> [0, 1]|S<sup>|</sup> , s. t. ||*u*(T) − valM(T)||<sup>∞</sup> < ε, ε-optimal scheduler πopt **Parameters:** w ∈ (0, 1), and ε<sup>i</sup> < ε, by default w = 0.1, ε<sup>i</sup> = w · ε 1: <sup>δ</sup>min = (1 <sup>−</sup> <sup>w</sup>) · <sup>2</sup> · (<sup>ε</sup> <sup>−</sup> <sup>ε</sup>i)/**E**max<sup>2</sup>/T 2: ε<sup>Ψ</sup> = ε<sup>r</sup> = wεδmin/T 3: t = 0, ε<sup>t</sup> acc = ε<sup>i</sup> 4: ∀s ∈ *MS* : *u*s(t)=(s ∈ G)?1 : 0 and ∀s ∈ *PS* : *u*s(t) = R<sup>∗</sup> <sup>ε</sup><sup>i</sup> (s, G) 5: ∀s ∈ *PS* : πopt(s, 0) = arg max R<sup>∗</sup> <sup>ε</sup><sup>i</sup> (s, G) 6: **while** t<T **do** 7: π = FindStrategy(*u*(t)) 8: δ, ε<sup>δ</sup> <sup>=</sup> FindStep(M, T <sup>−</sup> t, δmin, *<sup>u</sup>*(t), εΨ, εr, π) 9: compute *u*(t + δ) according to (5) for ε<sup>Ψ</sup> and ε<sup>r</sup> 10: t = t + δ, ε<sup>t</sup> acc = ε<sup>t</sup>−<sup>δ</sup> acc + ε<sup>δ</sup> 11: ∀s ∈ *PS*, τ ∈ (0, δ] : πopt(s, t + τ ) = π(s) 12: **return** *u*s(T), πopt

#### **4.1 Computing the Reachability Value**

In this section we discuss steps 4 and 9, that require computation of the reachability probability according to the system of Equations (1)–(2). Our approach is based on the approximation of the solution. The presence of two types of states, probabilistic and Markovian, demands separate treatment of those. Informally, we will combine two techniques: time-bounded reachability analysis on continuous time Markov chains<sup>3</sup> for Markovian states and time-unbounded reachability analysis on discrete time Markov chains<sup>4</sup> for probabilistic states. Parameters w and ε<sup>i</sup> of Algorithm 1 control the error allowed by the approximation. Here ε<sup>i</sup> bounds the error for the very first instance of time-unbounded reachability in line 4. While w defines the fraction of the error that can be used by the approximations in subsequent iterations (ε<sup>Ψ</sup> and εr).

We start with time-unbounded reachability analysis for probabilistic states. Let π ∈ Πstat, s, s ∈ S. We define

<sup>3</sup> Markov automata without probabilistic states.

<sup>4</sup> Markov automata without Markovian states and such that <sup>∀</sup><sup>s</sup> <sup>∈</sup> *PS* : <sup>|</sup>Act(s)<sup>|</sup> = 1.

198 Y. Butkova and G. Fox

$$\mathcal{R}(s,\pi,s') = \begin{cases} 1 & \text{if } s = s'\\ \sum\_{p \in S} \mathbb{P}[s,\pi(s),p] \cdot \mathcal{R}(p,\pi,s') & \text{else if } s \in PS\\ 0 & \text{otherwise} \end{cases} \tag{3}$$

This value denotes the probability to reach state s starting from state s by performing any number of probabilistic transitions and no Markovian transitions. This system of linear equations can be either solved exactly, e. g. via Gaussian elimination, or approximated (numerical methods). If R(s, π, s ) is under-approximated we denote it by R(s, π, s ), where is the approximation error. For A ⊆ S we define R(s, π, A) = - s-<sup>∈</sup><sup>A</sup> <sup>R</sup>(s, π, s - ), R(s, π, A) = s-<sup>∈</sup><sup>A</sup> <sup>R</sup>(s, π, s ).

For time bound 0, s ∈ *PS* the value val<sup>M</sup><sup>s</sup> (0) is the optimal probability to reach any goal state via only probabilistic transitions. We denote it by R<sup>∗</sup>(s, G) = max<sup>π</sup>∈Πstat R(s, π, G) (step 4). It is a well-known problem on *discrete time Markov decision processes* [Put94] and can be computed or approximated by policy iteration, linear programming [Put94] or interval value iteration [HM14,QK18,BKL+17]. If the value is approximated up to , we denote it by R<sup>∗</sup> (s, G).

The reachability analysis on Markovian states is solved with the well-known *uniformisation* approach [Jen53]. Informally, Markovian states will be implicitly *uniformised*: The exit rate for each Markovian state will be equal **E**max (by adding a self-loop transition), but this will not affect the reachability value.

We will first define the discrete probability to reach the target vector within <sup>k</sup> Markovian transitions. Let *<sup>x</sup>* <sup>∈</sup> [0, 1]|S<sup>|</sup> be a vector of values for each state. For <sup>k</sup> <sup>∈</sup> <sup>N</sup>0, π <sup>∈</sup> <sup>Π</sup>stat we define **<sup>D</sup>**<sup>k</sup> *<sup>x</sup>*(s, π) = 1 if s ∈ G and otherwise:

$$\mathbf{D}\_{x}^{k}(s,\pi) = \begin{cases} \mathbf{x}\_{s} & \text{if } k=0\\ \sum\_{s' \neq s} \frac{\mathbf{Q}(s,s')}{\mathbf{E}\_{\text{max}}} \cdot \mathbf{D}\_{x}^{k-1}(s',\pi) + (1 - \frac{\mathbf{E}(s)}{\mathbf{E}\_{\text{max}}}) \cdot \mathbf{D}\_{x}^{k-1}(s,\pi) & \text{if } k>0, s \in MS\\ \sum\_{s' \in MS \cup G} \Re(s,\pi,s') \cdot \mathbf{D}\_{x}^{k}(s',\pi) & \text{if } k>0, s \in PS \end{cases} \tag{4}$$

The value **D**<sup>k</sup> *<sup>x</sup>*(s, π) is the weighted sum over all states s of the value *x*<sup>s</sup> and the probability to reach s starting from s within k Markovian transitions. Therefore the counter k decreases only when a Markovian state performs a transition and is not affected by probabilistic transitions. If values R(s, π, s ) are approximated up to precision , i. e. R(s, π, s ) is used for probabilistic states instead of R(s, π, s ) in (4), we use the notation **D**<sup>k</sup> *<sup>x</sup>*,(s, π).

We denote with Ψ<sup>λ</sup> the probability mass function of the Poisson distribution with parameter <sup>λ</sup>. For a <sup>τ</sup> <sup>∈</sup> <sup>R</sup><sup>0</sup> and <sup>ε</sup><sup>Ψ</sup> <sup>∈</sup> (0, 1], <sup>N</sup>(τ,εΨ) is some natural number satisfying -N(τ,εΨ) <sup>i</sup>=0 <sup>Ψ</sup>**<sup>E</sup>**max·<sup>τ</sup> (i) <sup>1</sup>−εΨ, e. g. <sup>N</sup>(τ,εΨ) = **E**max · <sup>τ</sup> · <sup>e</sup><sup>2</sup> <sup>−</sup> ln(εΨ) [BHHK15], where e is the Euler's number.

We are now in position to describe a way to compute *u*(t + δ) at line 9 of Algorithm 1. Let *<sup>u</sup>*(t) <sup>∈</sup> [0, 1]|S<sup>|</sup> be a vector of values computed by the previous iteration of Algorithm <sup>1</sup> for time <sup>t</sup>. Let val <sup>M</sup>,π(<sup>t</sup> <sup>+</sup> <sup>δ</sup>) be the solution of the system of Equation (1) for time point t+δ, a stationary scheduler π : *PS* → Act and where *u*(t) is used instead of valM,π(t) as the boundary condition<sup>5</sup>. The following Lemma shows that val <sup>M</sup>,π(t+δ) can be efficiently approximated up to ε<sup>Ψ</sup> + εr:

**Lemma 2.** *Let* ε<sup>Ψ</sup> ∈ (0, 1], ε*<sup>r</sup>* ∈ [0, 1], ε*<sup>N</sup>* = ε*r*/N((T − t), εΨ) *and* δ ∈ [0, T − t]*. Then* <sup>∀</sup><sup>s</sup> <sup>∈</sup> <sup>S</sup> : *<sup>u</sup>*s(<sup>t</sup> <sup>+</sup> <sup>δ</sup>) val <sup>M</sup>,π <sup>s</sup> (<sup>t</sup> <sup>+</sup> <sup>δ</sup>) *<sup>u</sup>*s(<sup>t</sup> <sup>+</sup> <sup>δ</sup>) + <sup>ε</sup><sup>Ψ</sup> <sup>+</sup> <sup>ε</sup>*r, where:*

$$\mathbf{u}\_{s}(t+\delta) = \begin{cases} 1 & \text{if } s \in G\\ \sum\_{i=0}^{N(\delta,\varepsilon\Psi)} \Psi\_{\mathbf{E}\_{\max},\delta}(i) \cdot \mathbf{D}^{i}\_{\mathbf{u}(t),\varepsilon\_{N}}(s,\pi) & \text{else if } s \in MS\\ \sum\_{s'MS\cup G} \mathcal{R}\_{\varepsilon s}(s,\pi,s') \cdot \mathbf{u}\_{s'}(t+\delta) & \text{else if } s \in PS \end{cases} \tag{5}$$

#### **4.2 Choosing a Strategy**

The strategy for the next interval is computed in Step 7 and implicitly in Step 4. The latter has been discussed in Sect. 4.1. We proceed to Step 7.

Here we search for a strategy that remains constant for all time points within interval (t, t + δ], for some δ > 0, and introduces only an acceptable error. Analogously to results for *continuous time Markov decision processes* [Mil68], we prove that derivatives of function *u*(τ ) at time τ = t help finding the strategy π that remains optimal for interval (t, t + δ], for some δ > 0. This is rooted in the Taylor expansion of function *u*(t + δ) via the values of *u*(t). We define sets

$$\begin{aligned} \mathcal{F}\_0 &= \{ \pi \in \Pi\_{\texttt{stat}} \mid \forall s \in PS : \pi = \text{arg}\,\text{max}\_{\pi' \in \Pi\_{\texttt{stat}}} \mathsf{d}\_{\pi'}^{(0)}(s) \} \\ \mathcal{F}\_i &= \{ \pi \in \mathcal{F}\_{i-1} \mid \forall s \in PS : \pi = \text{arg}\,\text{max}\_{\pi' \in \mathcal{F}\_{i-1}} (-1)^{i-1} \mathsf{d}\_{\pi'}^{(i)}(s) \}, i \geqslant 1, \end{aligned}$$

where for <sup>π</sup> <sup>∈</sup> <sup>Π</sup>stat, <sup>s</sup> <sup>∈</sup> <sup>G</sup> : *<sup>d</sup>*(0) <sup>π</sup> (s) = 1, for <sup>s</sup> <sup>∈</sup> *MS* \ <sup>G</sup> : *<sup>d</sup>*(0) <sup>π</sup> (s) = *u*s(t), for <sup>s</sup> <sup>∈</sup> *PS* \ <sup>G</sup> : *<sup>d</sup>*(0) <sup>π</sup> (s) = - s-<sup>∈</sup>*MS*∪<sup>G</sup> <sup>R</sup>(s, π, s ) · *u*<sup>s</sup>-(t) and for i 1:

$$\mathbf{d}^{(i)}\_{\pi}(s) = \begin{cases} 0 & \text{if } s \in G\\ \sum\limits\_{s' \in S} \mathbf{Q}(s, s') \cdot \mathbf{d}^{(i-1)}(s') & \text{if } s \in MS \backslash G\\ \sum\limits\_{s' \in MS} \mathbf{R}(s, \pi, s') \cdot \mathbf{d}^{(i)}(s') & \text{if } s \in PS \backslash G \end{cases} \qquad \mathbf{d}^{(i)} = \mathbf{d}^{(i)}\_{\pi} \text{ for any } \pi \in \mathcal{F}i,$$

The value *d*(i) <sup>π</sup> (s) is the i th derivative of *u*s(t) at time t for a scheduler π.

**Lemma 3.** *If* π ∈ F|S|+1 *then* ∃δ > 0 *such that* π *is optimal on* (t, t + δ]*.*

Thus in order to compute a stationary strategy that is optimal on timeinterval (t, t+δ], for some δ > 0, one needs to compute at most |S|+1 derivatives

<sup>5</sup> val <sup>M</sup>,π(<sup>t</sup> <sup>+</sup> <sup>δ</sup>) may differ from valM,π(<sup>t</sup> <sup>+</sup> <sup>δ</sup>) since its boundary condition *<sup>u</sup>*(t) is an approximation of the boundary condition valM,π(t), used by valM,π(t + δ).

of *u*(τ ) at time t. Procedure FindStrategy does exactly that. It computes sets F<sup>i</sup> until for some j ∈ 0..(|S| + 1) there is only 1 strategy left, i. e. |F<sup>j</sup> | = 1. Otherwise it outputs any strategy in F|S|+1. Similarly to Sect. 4.1, the scheduler that maximises the values R(s, π, s ) can be approximated. This question and other optimisations are discussed in detail in Sect. 4.4.

#### **4.3 Finding Switching Points**

Given that a strategy π is computed by FindStrategy, we need to know for how long this strategy can be followed before the action has to change for at least one of the states. We consider the behaviour of the system in the time interval [t, T]. Recall the function val <sup>π</sup>(<sup>t</sup> <sup>+</sup> <sup>δ</sup>), δ <sup>∈</sup> [0, T <sup>−</sup> <sup>t</sup>], defined in Sect. 4.1 (Lemma 2) as the solution of the system of Equation (1) with the boundary condition *u*(t), for a stationary scheduler π. For a probabilistic state s the following holds:

$$\widetilde{\text{val}\_s}^{\pi}(t+\delta) = \sum\_{s' \in MS \cup G} \mathcal{R}(s,\pi,s') \cdot \widetilde{\text{val}\_{s'}}(t+\delta) \tag{6}$$

Let s ∈ *PS*, π ∈ Πstat, α ∈ Act(s). Consider the following function:

$$\widetilde{\mathrm{val}}\_{s}^{\pi,s\to\alpha}(t+\delta) = \sum\_{s' \in MS \cup G} \underbrace{\sum\_{s'' \in S} \mathbb{P}[s,\alpha,s''] \cdot \mathcal{R}(s'',\pi,s') \cdot \widetilde{\mathrm{val}}\_{s'}^{\pi}(t+\delta)}\_{\mathcal{R}\_{s\to\alpha}(s,\pi,s')}$$

This function denotes the reachability value for time bound t + δ and a scheduler that is different from π. Namely, this is such a scheduler, that all states follow strategy π, except for state s, that selects action α for the very first transition, and afterwards selects action π(s). Between two switching points the strategy <sup>π</sup> is optimal and therefore the value of val π,s→<sup>α</sup> <sup>s</sup> (t+δ) is not greater than val <sup>π</sup> <sup>s</sup> (t+δ) for all s ∈ *PS*, α ∈ Act(s). If for some δ ∈ [0, T −t], s ∈ *PS*, α ∈ Act(s) it holds that val π,s→<sup>α</sup> <sup>s</sup> (<sup>t</sup> <sup>+</sup> <sup>δ</sup>) <sup>&</sup>gt; val <sup>π</sup> <sup>s</sup> (t + δ), then action α is better for s then π(s), and therefore π(s) is not optimal for s at t + δ. We show that the next switching point after time point t is such a value t + δ, δ ∈ (0, T − t], that

$$\begin{aligned} \forall s \in PS, \forall \alpha \in \text{Act}(s), \forall \tau \in [0, \delta): &\widetilde{\text{val}\_s^\pi}(t + \tau) \geqslant \widetilde{\text{val}\_s^{\pi, s \to \alpha}}(t + \tau) \\ \text{and } \exists s \in PS, \alpha \in \text{Act}(s): &\widetilde{\text{val}\_s^\pi}(t + \delta) < \widetilde{\text{val}\_s^{\pi, s \to \alpha}}(t + \delta) \end{aligned} \tag{7}$$

Procedure FindStep approximates switching points iteratively. It splits the time interval [0, T] into subintervals [t1, t2],..., [t<sup>n</sup>−<sup>1</sup>, tn] and at each iteration k checks whether (7) holds for some δ ∈ [tk, tk+1]. The latter is performed by procedure CheckInterval. If <sup>∀</sup><sup>δ</sup> <sup>∈</sup> [tk, tk+1] (7) does not hold, FindStep repeats by increasing k. Otherwise, it outputs the largest δ ∈ [tk, tk+1] for which (7) does not hold (line 11). This is done by binary search up to distance δmin. Later in this section we will show that establishing that (7) does not hold for all δ ∈ [tk, tk+1] can be efficiently performed by considering only 2 time points of the interval [tk, tk+1] and a subset of state-action pairs.

#### **Algorithm 2.** FindStep

**Input:** MA <sup>M</sup> = (S, Act, **<sup>P</sup>**, <sup>Q</sup>, G), time left <sup>t</sup> <sup>∈</sup> <sup>Q</sup><sup>0</sup>, minimal step size <sup>δ</sup>min, vector *<sup>u</sup>* <sup>∈</sup> [0, 1]|S<sup>|</sup> , ε<sup>Ψ</sup> ∈ (0, 1], ε<sup>r</sup> ∈ [0, 1], π ∈ Πstat **Output:** step δ ∈ [δmin, t] and upper bound on accumulated error ε<sup>δ</sup> 0 1: **if** (t δmin) **then return** t, (**E**max · t) <sup>2</sup>/2 2: k = 1, t<sup>1</sup> = δmin 3: **do** 4: tk+1 = min{t, TΨ(k + 1, εΨ), (t<sup>k</sup> · **E**max + 1)/**E**max} 5: set A = Tmax(k + 1) or A = *PS* × Act see discussion in the end of Sect. 4.3 6: *toswitch* <sup>=</sup> CheckInterval(M, [tk, tk+1], A, εΨ, εr) 7: k = k + 1 8: **while** (not *toswitch*) and t<sup>k</sup> < t) 9: k = k − 1 10: **if** (*toswitch* = true) **then** 11: find the largest <sup>δ</sup> <sup>∈</sup> [tk, tk+1], s. t. CheckInterval(M, [tk, δ], A, εΨ, εr) =false 12: **if** (δ>δmin) **then** = 0 **else** = (**E**maxδmin) <sup>2</sup>/2 13: **return** δ, 14: **else return** t, 0

*Selecting*tk. This step is a heuristic. The correctness of our algorithm does not depend on the choices of tk, but its runtime is supposed to benefit from it: Obviously, the runtime of FindStrategy is best given an oracle that produces time points t<sup>k</sup> which are exactly the switching points of the optimal strategy. Any other heuristic is just a guess.

At every iteration k we choose such a time point t<sup>k</sup> that the MA is very likely to perform at most k Markovian transitions within time tk. "Very likely" here means with probability 1 <sup>−</sup> <sup>ε</sup>Ψ. For <sup>k</sup> <sup>∈</sup> <sup>N</sup> we define <sup>T</sup>Ψ(k, εΨ) as follows: TΨ(1, εΨ) = δmin, and for k > 1: TΨ(k, εΨ) satisfies k <sup>i</sup>=0 Ψ**<sup>E</sup>**max·TΨ(k,εΨ)(i) 1 − εΨ.

*Searching for switching points within* [tk, tk+1]. In order to check whether val <sup>π</sup>(t<sup>+</sup> <sup>δ</sup>) val π,s→<sup>α</sup>(<sup>t</sup> <sup>+</sup> <sup>δ</sup>) for *all* <sup>δ</sup> <sup>∈</sup> [tk, tk+1] we only have to check whether the maximum of function diff(s, α, t <sup>+</sup> <sup>δ</sup>) = val π,s→<sup>α</sup> <sup>s</sup> (<sup>t</sup> <sup>+</sup> <sup>δ</sup>) <sup>−</sup> val <sup>π</sup> <sup>s</sup> (t + δ) is at most 0 on this interval for all s ∈ *PS*, α ∈ Act(s). In order to achieve this we work on the approximation of diff(s, α, t + δ) derived from Lemma 2, thus establishing a sufficient condition for the scheduler to remain optimal:

$$\begin{split} \widetilde{\mathrm{val}}\_{s}^{\pi,s\to\alpha}(t+\delta) &= \sum\_{s' \in MS \cup G} \mathcal{R}\_{s\to\alpha}(s,\pi,s') \cdot \widetilde{\mathrm{val}}\_{s'}^{\pi}(t+\delta) \\ &\leq \sum\_{s' \in MS \cup G} \mathcal{R}\_{s\to\alpha,\varepsilon\chi}(s,\pi,s') \sum\_{i=0}^{k} \Psi\_{\mathbf{Emax}} \cdot \delta(i) \cdot \mathbf{D}\_{\mathbf{u}\{t\},\varepsilon\chi}^{i}(s',\pi) \\ &+ \mathcal{R}\_{s\to\alpha,\varepsilon\_{N}}(s,\pi,G) + \varepsilon \nu + \varepsilon\_{\mathrm{r}} \end{split} \tag{8}$$

Here Rs→α,ε<sup>N</sup> (s, π, s ) (Rs→α,ε<sup>N</sup> (s, π, G)) denotes an under-approximation of the value Rs→α(s, π, s ) (Rs→α(s, π, G) resp.) up to εN, defined in Lemma 2. And analogously for val <sup>π</sup>(<sup>t</sup> <sup>+</sup> <sup>δ</sup>). Simple rewriting leads to the following:

$$\widetilde{\text{val}}\_s^{\pi,s\to\alpha}(t+\delta) - \widetilde{\text{val}}\_s^{\pi}(t+\delta) \lessgtr \sum\_{i=0}^k \Psi\_{\mathbf{E}\_{\text{max}}\cdot\delta}(i) \cdot B\_{\pi,\varepsilon\_N}^i(s,\alpha) + C\_{\pi,\varepsilon\_N}(s,\alpha), \tag{9}$$

where B<sup>i</sup> π,ε<sup>N</sup> (s, α) = - s-∈*MS*\G R<sup>s</sup>→α,ε<sup>N</sup> (s, π, s )−R<sup>ε</sup><sup>N</sup> (s, π, s ) ·**D**<sup>i</sup> *<sup>u</sup>*(t),ε<sup>N</sup> (s , π) and Cπ,ε<sup>N</sup> (s, α) = R<sup>s</sup>→α,ε<sup>N</sup> (s, π, G)− R<sup>ε</sup><sup>N</sup> (s, π, G) +ε<sup>Ψ</sup> +εr. In order to find the supremum of the right-hand side of (9) over all δ ∈ [a, b] we search for extremum of each <sup>y</sup>i(δ)=Ψ**<sup>E</sup>**max(t+δ)(i)· <sup>B</sup><sup>i</sup> π,ε<sup>N</sup> (s, α), i = 0..k, separately as a function of δ. Simple derivative analysis shows that the extremum of these functions is achieved at δ = i/**E**max. Truncation of the time interval by (t<sup>k</sup> · **E**max + 1)/**E**max (step 4, Algorithm 2) ensures that for all i = 0..k the extremum of yi(δ) is attained at either δ = t<sup>k</sup> or δ = tk+1.

**Lemma 4.** *Let* [tk, tk+1] *be the interval considered by* CheckInterval *at iteration* k*.* ∀δ ∈ [tk, tk+1], s ∈ *PS*, α ∈ *Act:*

$$\text{diff}(s, \alpha, t + \delta) \lessapprox \sum\_{i=0}^{k} \Psi\_{\mathbf{E}\_{\text{max}}\delta(s, \alpha, i)}(i) \cdot B\_{\pi, \varepsilon\_N}^{i}(s, \alpha) + C\_{\pi, \varepsilon\_N}(s, \alpha), \tag{10}$$

*where*

$$\delta(s,\alpha,i) = \begin{cases} t\_k & \text{if } B^i\_{\pi,\varepsilon\_N}(s,\alpha) \gg 0 \text{ and } i/\mathbf{E}\_{\max} \leqslant t\_k\\ & \text{or } B^i\_{\pi,\varepsilon\_N}(s,\alpha) \leqslant 0 \text{ and } i/\mathbf{E}\_{\max} > t\_k\\ t\_{k+1} & \text{otherwise} \end{cases}$$

CheckInterval returns false iff for all <sup>s</sup> <sup>∈</sup> *PS*, α <sup>∈</sup> Act the right-hand side of (10) is less or equal to 0. Since Lemma 4 over-approximates diff(s, α, t+δ) false positives are inevitable. Namely, it is possible that procedure CheckInterval suggests that there exists a switching point within [tk, tk+1], while in reality there is none. This however does not affect correctness of the algorithm and only its running time.

*Finding Maximal Transitions.* Here we show that there exists a subset of states, such that, if the optimal strategy for these states does not change on an interval, then the optimal strategy for *all* states does not change on this interval.

In the following we call a pair (s, α) ∈ *PS* × Act a *transition*. For transitions (s, α),(s , α ) ∈ *PS* ×Act we write (s, α) <sup>k</sup> (s , α ) iff Cπ,ε<sup>N</sup> (s, α) Cπ,ε<sup>N</sup> (s , α ) and <sup>∀</sup><sup>i</sup> = 0..k : <sup>B</sup><sup>i</sup> π,ε<sup>N</sup> (s, α) B<sup>i</sup> π,ε<sup>N</sup> (s , α ). We say that a transition (s, α) is *maximal* if there exists no other transition (s , α ) that satisfies the following: (s, α) <sup>k</sup> (s , α ) and at least one of the following conditions hold: Cπ,ε<sup>N</sup> (s, α) < Cπ,ε<sup>N</sup> (s , α ) or <sup>∃</sup><sup>i</sup> = 0..k : <sup>B</sup><sup>i</sup> π,ε<sup>N</sup> (s, α) < B<sup>i</sup> π,ε<sup>N</sup> (s , α ). The set of all maximal transitions is denoted with Tmax(k).

We prove that if inequality (10) holds for all transitions from Tmax(k), then it holds for all transitions. Thus only transitions from Tmax(k) have to be checked by procedure CheckInterval. In our implementation we only compute <sup>T</sup>max(k) before the call to CheckInterval at line 11 of Algorithm 2, and use the set A = *PS* × Act within the while-loop.

#### **4.4 Optimisation for Large Models**

Here we discuss a number of implementation improvements developers should consider when applying our algorithm to large case studies:

*Switching points.* It may happen that the optimal strategy switches very often on a time interval, while the effect of these frequent switches is negligible. The difference may be so small that the ε-optimal strategy actually stays stationary on this interval. In addition, floating-point computations may lead to imprecise results: Values that are 0 in theory might be represented by non-zero float-point numbers, making it seem as if the optimal strategy changed its decision, when in fact it did not. To counteract these issues we can modify CheckInterval such that it outputs false even if the right-hand side of (10) is positive, as long as it is sufficiently small. The following lemma proves that the error introduced by not switching the decision is acceptable:

**Lemma 5.** *Let* δ = tk+1 − tk*,* ε = ε − εi, ∈ (0, ε · δ/T) *and* N(δ, ) = (**E***max*δ)<sup>2</sup>/2.0/ *. If* ∀s ∈ *PS*, α ∈ *Act*, τ ∈ [tk, tk+1] *the right-hand side of (10) is not greater than* (ε δ/T − )/N(δ, )*, then* π *is* ε δ/T*-optimal in* [tk, tk+1]*.*

*Optimal strategy.* In some cases computation of the optimal strategy in the way it was described in Sect. 4.2 is computationally expensive, or is not possible at all. For example, if some values <sup>|</sup>*d*(i) <sup>π</sup> (s)| are larger than the maximal floating point number that a computer can store, or if the computation of |S|+ 1 derivatives is already too prohibitive for models of large state space, or if the values R(s, π, s ) can only be approximated and not computed precisely. With the help of Lemma 5 and minor modifications to Algorithm 1, the correctness and convergence of Algorithm 1 can be preserved even when the strategy computed by FindStrategy is not guaranteed to be optimal.

### **5 Empirical Evaluation**

We implemented our algorithm as a part of IMCA [GHKN12]. Experiments were conducted as single-thread processes on an Intel Core i7-4790 with 32 GB of RAM. We compare the algorithm presented in this paper with [Neu10] and [BHHK15]. Both are available in IMCA. We use the following abbreviations to refer to the algorithms: FixStep for [Neu10], Unif<sup>+</sup> for [BHHK15] and SwitchStep for Algorithm 1. The value of the parameter w in Algorithm 1 is set to 0.1, ε<sup>i</sup> = 0. We keep the default values of all other algorithms.


**Table 1.** The discretisation step used in some of the experiments shown in Fig. 3.

The evaluation is performed on a set of published benchmarks:

dpm-**j**-**k:** A model of a *dynamic power management system* [QWP99], representing the internals of a Fujitsu disk drive. The model contains a queue, service requester, service provider and a power manager. The requester generates tasks of j types differing in energy requirements, that are stored in the queue of size k. The power manager selects the processing mode for the service provider. A state is a goal state if the queue of at least one task type is full.

qs**-j-k** and ps**-j-k:** Models of a *queuing system* [HH12] and a *polling system* [GHH+13] where incoming requests of j types are buffered in two queues of size k each, until they are processed by the server. We consider the state with both queues being full to form the goal state set.

The memory required by all three algorithms is polynomial in the size of the model. For the evaluation we therefore concentrate on runtime only. We set the time limit for the experiments to 15 minutes. Timeouts are marked by **x** in the plots. Runtimes are given in seconds. All the plots use the log-log axis.

#### **Results**

SwitchStep **vs** FixStep. Figure 3 compares runtimes of SwitchStep and FixStep. For these experiments precision is set to 10−<sup>3</sup> and the state space size ranges from 10<sup>2</sup> to 10<sup>5</sup>.

This plot represents the general trend observed in many experiments: The algorithm FixStep does not scale well with the size of the problem (state space, precision, time bound). For larger benchmarks it usually required more than 15 minutes. This is likely due to the fact that the discretisation step used by FixStep is very small, which means that the algorithm performs many iterations. In fact Table 1 reports on the size

**Fig. 3.** Running time comparison of FixStep and SwitchStep.

of the discretisation steps for both FixStep and SwitchStep on a few benchmarks. Here the column δ<sup>F</sup> shows the length of the discretisation step of FixStep. As we mentioned in Sect. 3, this step is fixed for the selected values of time bound and precision. Columns min δS, avgδ<sup>S</sup> and max δ<sup>S</sup> show minimal, average and maximal steps used by SwitchStep respectively. The average step used by SwitchStep is several orders of magnitude larger than that of FixStep. Therefore SwitchStep performs much less iterations. Even though each iteration takes longer, overall significant decrease in the amount of iterations leads to much smaller total runtime.

SwitchStep **vs** Unif<sup>+</sup>. In order to compare SwitchStep with Unif<sup>+</sup> we have to restrict ourselves to a subclass of Markov automata in which probabilistic and Markovian states alternate, and probabilistic states have only 1 successor for each action. This is due to the

**Table 2.** Parameters of the experiments shown in Fig. 4.


fact that Unif<sup>+</sup> is available in IMCA only for this subclass of models.

**Fig. 4.** Running times of algorithms SwitchStep and Unif<sup>+</sup>.

Figure 4 shows the comparison of running times of SwitchStep and Unif<sup>+</sup>. For the plot on the left we varied those model parameters that affect state space size, number of non-deterministic actions and maximal exit rate. In the plot on the right the model parameters are fixed, but precision and time bounds used for the experiments are differing. Table 2 shows the parameters of the models used in these experiments. We observe that there are cases in which SwitchStep performs remarkably better than Unif<sup>+</sup>, and cases of the opposite. Consider the experiments in Fig. 4, right. They show that Unif<sup>+</sup> may be highly sensitive to variations of time bounds and precision, while SwitchStep is more robust in this respect. This is due to the fact that the scheduler computed by Unif<sup>+</sup> does not have means to observe time precisely and can only guess it. This may be good enough, which is the case on the ps benchmark. However if it is not, then better precision will require many more computations. Additionally Unif<sup>+</sup> does not use discretisation. This means that the increase of the time bound from T<sup>1</sup> to T<sup>2</sup> may significantly increase the overall running time, even if no new switching points appear on the interval [T1, T2]. SwitchStep does not suffer from these issues due to the fact that it considers schedulers that observe the time precisely and uses the discretisation. Large time intervals that introduce no switching points will likely be handled within one iteration.

In general, SwitchStep performs at its best when there are not too many switching points, which is what is observed in most published case studies.

*Conclusions:* We conclude that SwitchStep does not replace all existing algorithms for time bounded reachability. However it does improve the state of the art in many cases and thus occupies its own niche among available solutions.

### **References**

	- [BS11] Buchholz, P., Schulz, I.: Numerical analysis of continuous time Markov decision processes over finite horizons. Comput. OR **38**(3), 651–659 (2011). https://doi.org/10.1016/j.cor.2010.08.011
	- [Hat17] Hatefi-Ardakani, H.: Finite horizon analysis of Markov automata. Ph.D. thesis, Saarland University, Germany (2017). http://scidok.sulb.unisaarland.de/volltexte/2017/6743/
	- [Her02] Hermanns, H.: Interactive Markov Chains: The Quest for Quantified Quality. LNCS, vol. 2428. Springer, Heidelberg (2002). https://doi.org/10.1007/ 3-540-45804-2
	- [HH12] Hatefi, H., Hermanns, H.: Model checking algorithms for Markov automata. ECEASST **53** (2012). http://journal.ub.tu-berlin.de/eceasst/article/view/ 783
	- [HH14] Hartmanns, A., Hermanns, H.: The modest toolset: an integrated environment for quantitative modelling and verification. In: Abrah´ ´ am, E., Havelund, K. (eds.) TACAS 2014. LNCS, vol. 8413, pp. 593–598. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54862-8 51
	- [HH15] Hatefi, H., Hermanns, H.: Improving time bounded reachability computations in interactive Markov chains. Sci. Comput. Program. **112**, 58–74 (2015). https://doi.org/10.1016/j.scico.2015.05.003
	- [HM14] Haddad, S., Monmege, B.: Reachability in MDPs: refining convergence of value iteration. In: Ouaknine, J., Potapov, I., Worrell, J. (eds.) RP 2014. LNCS, vol. 8762, pp. 125–137. Springer, Cham (2014). https://doi.org/10. 1007/978-3-319-11439-2 10
	- [Jen53] Jensen, A.: Markoff chains as an aid in the study of markoff processes. Scand. Actuarial J. **1953**(sup1), 87–91 (1953). https://doi.org/10.1080/ 03461238.1953.10419459
	- [KNP11] Kwiatkowska, M.Z., Norman, G., Parker, D.: PRISM 4.0: verification of probabilistic real-time systems. In: Gopalakrishnan, G., Qadeer, S. (eds.) CAV 2011. LNCS, vol. 6806, pp. 585–591. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-22110-1 47
	- [MCB84] Marsan, M.A., Conte, G., Balbo, G.: A class of generalized stochastic Petri nets for the performance evaluation of multiprocessor systems. ACM Trans. Comput. Syst. **2**(2), 93–122 (1984). https://doi.org/10.1145/190.191
		- [Mil68] Miller, B.: Finite state continuous time Markov decision processes with a finite planning horizon. SIAM J. Control **6**(2), 266–280 (1968). https:// doi.org/10.1137/0306020
	- [Mol82] Molloy, M.K.: Performance analysis using stochastic Petri nets. IEEE Trans. Comput. **C–31**(9), 913–917 (1982)
	- [Neu10] Neuh¨außer, M.R.: Model checking nondeterministic and randomly timed systems. Ph.D. thesis, RWTH Aachen University (2010). http://darwin. bth.rwth-aachen.de/opus3/volltexte/2010/3136/
	- [Put94] Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. Wiley, Hoboken (1994)
	- [QK18] Quatmann, T., Katoen, J.-P.: Sound value iteration. In: Chockler, H., Weissenbacher, G. (eds.) CAV 2018. LNCS, vol. 10981, pp. 643–661. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-96145-3 37
	- [RS13] Rabe, M.N., Schewe, S.: Optimal time-abstract schedulers for CTMDPs and continuous-time Markov games. Theor. Comput. Sci. **467**, 53–67 (2013). https://doi.org/10.1016/j.tcs.2012.10.001

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Synthesis

# **Minimal-Time Synthesis for Parametric Timed Automata**

Etienne Andr´ ´ <sup>e</sup>1,2,3 , Vincent Bloemen4(B) , Laure Petrucci<sup>1</sup>, and Jaco van de Pol4,5

 LIPN, CNRS UMR 7030, Universit´e Paris 13, Villetaneuse, France JFLI, CNRS, Tokyo, Japan National Institute of Informatics, Tokyo, Japan University of Twente, Enschede, The Netherlands v.bloemen@utwente.nl University of Aarhus, Aarhus, Denmark

**Abstract.** Parametric timed automata (PTA) extend timed automata by allowing parameters in clock constraints. Such a formalism is for instance useful when reasoning about unknown delays in a timed system. Using existing techniques, a user can synthesize the parameter constraints that allow the system to reach a specified goal location, regardless of how much time has passed for the internal clocks.

We focus on synthesizing parameters such that not only the goal location is reached, but we also address the following questions: *what is the minimal time to reach the goal location?* and *for which parameter values can we achieve this?* We analyse the problem and present a semialgorithm to solve it. We also discuss and provide solutions for minimizing a specific parameter value to still reach the goal.

We empirically study the performance of these algorithms on a benchmark set for PTAs and show that *minimal-time reachability synthesis* is more efficient to compute than the standard synthesis algorithm for reachability. Data or code related to this paper is available at: [26].

### **1 Introduction**

*Timed Automata (TA)* [2] extend finite automata with *clocks*, for instance to model real-time systems. Timed automata allow for reasoning about temporal properties of the designed system. In addition to reachability problems, it is possible to compute for TAs the minimal or maximal time required to reach a specific goal location. Such a result is valuable in practice, as it can describe the response time of a system or it may indicate when a component failure occurs.

c The Author(s) 2019

This work is partially supported by the ANR national research program PACS (ANR-14-CE28-0002) and PHC Van Gogh project PAMPAS.

E. Andr´ ´ e—Partially supported by ERATO HASUO Metamathematics for Systems Design Project (No. JPMJER1603), JST.

V. Bloemen—Supported by the 3TU.BSR project.

T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 211–228, 2019. https://doi.org/10.1007/978-3-030-17465-1\_12

**Fig. 1.** Train delay scheduling problem: Alice (depicted in dotted red), located at <sup>A</sup>, wants to go to station D. Bob (depicted in dashed blue), located at B, wants to go to A. By setting the train delays *D<sup>1</sup>* and *D<sup>2</sup>* for train 1 and 2, make sure that both Alice and Bob reach their target station in minimum total time. (Color figure online)

It may not always be possible to describe a real-time system with a TA. There are often uncertainties in the timing constraints, for instance how long it takes between sending and receiving a message. Optimising specific timing delays to improve the overall throughput of the system may also be considered, as shown in Example 1. Such uncertainties can however be modelled using a *parametric timed automaton (PTA)* [3]. A PTA adds parameters, or unknown constants, to the TA formalism. By examining the reachability of a goal location, the parameters get constrained and we can observe which parameter valuations preserve the reachability of the goal location.

This process, also called *parameter synthesis*, is definitely useful for analysing reachability properties of a system. However, this technique does disregard timing aspects to some extent. Given the parameter constraints, it is no longer possible to give clear boundaries on the time to reach the goal, as this may depend on the parameter valuations. We focus on the parameter synthesis problem while reaching the goal location in minimal time, as demonstrated in Example 1.

*Example 1.* Consider the example in Fig. 1, which depicts a train network consisting of two trains. Both trains share locations B and D (the station platforms) while locations A , B , C , D , B, and D represent a train travelling (tracks). The travel time for train 1 between any two stations is 100, and 55 for train 2. Train 1 stops at stations A, B, C, and D, for time *D<sup>1</sup>* (and train 2 stops for *D<sup>2</sup>* time units at B and D). Here, the train delays *D<sup>1</sup>* and *D<sup>2</sup>* are parameters and x<sup>1</sup> and x<sup>2</sup> are clocks. Both clocks start at 0 and reset after every transition. We assume that the trains use different tracks and changing trains at the platform of a station can be done in negligible time.

Alice is starting her journey from A and would like to go to D. Bob is located at B and wants to go to A. Train 1 and/or 2 can be used to travel, if both the train and the person are at the same location. Initially, both Alice and Bob wait for a train, since the initial positions of train 1 and 2 are respectively C' and D".

We would like to set the train delays *D<sup>1</sup>* and *D<sup>2</sup>* in such a way that the total time for Alice and Bob to reach their target location, i. e. the PTA location for which Alice is at station D and Bob is at station A, is minimal. The optimal solution is *<sup>D</sup><sup>1</sup>* = 25 <sup>∧</sup> *<sup>D</sup><sup>2</sup>* = 15, which leads to a total time of 405 units<sup>1</sup>. Note that this is neither optimal for Alice (the fastest would be *D<sup>1</sup>* = 0 ∧ *D<sup>2</sup>* = 5), nor optimal for Bob (*D<sup>1</sup>* = 10 ∧ *D<sup>2</sup>* = 0).

Note that in other instances, the time to reach a goal location may be an interval, describing the lower- and upper-bound on the time. This can be achieved in the example by changing the travel time from train 1 to be between 95 and 105, by guarding the outgoing transitions from locations A , B , C and D with 95 ≤ x<sup>1</sup> ≤ 105 (instead of x<sup>1</sup> = 100). We focus on the lower-bound *global time*, meaning that we look at the minimal *total* time passed in the system, which may differ from the clock values as the clocks can be reset.

In this paper, we address the following problems:


For all stated problems we provide algorithms to solve them and empirically compare them with a set of benchmark experiments for PTAs, obtained from [5]. Interestingly, compared to standard reachability and synthesis, minimal-time reachability and synthesis is in general computed faster as fewer states have to be considered in the exploration. We also look at the computability and intractability of the problems for PTAs and L/U-PTAs (PTAs for which each parameter only appears as a lower- or upper-bound).

*Related work.* The earliest work on minimal-time reachability for timed automata was by Courcoubetis and Yannakis [17], who first addressed the problem of computing lower and upper bounds. Several algorithms have been developed since to improve performance [22,24,25], by e. g. using parallelism. Related problems have been studied, such as minimal-time reachability for weighted timed automata [4], minimal-cost reachability in priced timed automata [12], and job scheduling for timed automata [1].

Concerning parametric timed automata, to the best of our knowledge, the minimal-time reachability problem was not tackled in the past. The reachabilityemptiness problem ("the emptiness of the parameter valuation set for which a

<sup>1</sup> Alice waits for train 1 to reach A at time 225, then she hops on and exits the train on time 350 at B. There she can immediately take train 2 and reach D at time 405. Bob waits for train 2 to reach B at time 55 and takes this train. At time 125 he reaches D and can immediately hop on train 1. Bob reaches A at time 225.

given set of locations is reachable") is undecidable [3], with various settings considered, notably a single clock compared to parameters [21] or a single rationalvalued or integer-valued parameter [14,21] (see [6] for a survey). Only severely limiting the number of clocks (e. g. [3,11,14,16]), and often restricting to integervalued parameters, can bring some decidability. Emptiness for the subclass of L/U-PTAs is also decidable [13]. Minimizing a parameter can however be considered done in the setting of upper-bound PTAs (PTAs in which the clocks are only restricted from above): the exact synthesis of integer valuations for which a location is reachable can be done [15], and therefore the minimum valuation of a parameter can be obtained.

### **2 Preliminaries**

We assume a set <sup>X</sup> <sup>=</sup> {x1,...,x|X|} of *clocks*, i. e. real-valued variables that evolve at the same rate. A clock valuation is <sup>ν</sup><sup>X</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup>≥<sup>0</sup>. We write **<sup>0</sup>** for the clock valuation assigning 0 to all clocks. Given <sup>d</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, <sup>ν</sup><sup>X</sup> <sup>+</sup> <sup>d</sup> is the valuation s.t. (ν<sup>X</sup> <sup>+</sup> <sup>d</sup>)(x) = <sup>ν</sup>X(x) + <sup>d</sup>, for all <sup>x</sup> <sup>∈</sup> <sup>X</sup>. Given <sup>R</sup> <sup>⊆</sup> <sup>X</sup>, we define the *reset* of a valuation νX, denoted by [νX]R, as follows: [νX]R(x) = 0 if x ∈ R, and [νX]R(x) = νX(x) otherwise.

We assume a set <sup>P</sup> <sup>=</sup> {p1,...,p|P|} of *parameters*. A parameter *valuation* <sup>ν</sup><sup>P</sup> is <sup>ν</sup><sup>P</sup> : <sup>P</sup> <sup>→</sup> <sup>Q</sup>+. We denote ∈ {<, <sup>≤</sup>, <sup>=</sup>, <sup>≥</sup>, >}, ∈ {<, ≤}, and ∈ {>, ≥}. A guard <sup>g</sup> is a constraint over <sup>X</sup> <sup>∪</sup> <sup>P</sup> defined by a conjunction of inequalities of the form x d or x p, with <sup>x</sup> <sup>∈</sup> <sup>X</sup>, <sup>d</sup> <sup>∈</sup> <sup>N</sup> and <sup>p</sup> <sup>∈</sup> <sup>P</sup>. Given a guard <sup>g</sup>, we write ν<sup>X</sup> |= νP(g) if the expression obtained by replacing each clock x ∈ C appearing in <sup>g</sup> by <sup>ν</sup>X(x) and each parameter <sup>p</sup> <sup>∈</sup> <sup>P</sup> appearing in <sup>g</sup> by <sup>ν</sup>P(p) evaluates to true.

### **2.1 Parametric Timed Automata**

**Definition 1 (PTA).** *A PTA* <sup>A</sup> *is a tuple* <sup>A</sup> = (Σ, L, 0, <sup>X</sup>, <sup>P</sup>, <sup>I</sup>, E)*, where: (i)* Σ *is a finite set of actions, (ii)* L *is a finite set of locations, (iii)* <sup>0</sup> ∈ L *is the initial location, (iv)* X *is a finite set of clocks, (v)* P *is a finite set of parameters, (vi)* I *is the invariant, assigning to every* ∈ L *a guard* I()*, (vii)* E *is a finite set of edges* e = (, g, a, R, ) *where* , ∈ L *are the source and target locations,* <sup>a</sup> <sup>∈</sup> <sup>Σ</sup>*,* <sup>R</sup> <sup>⊆</sup> <sup>X</sup> *is a set of clocks to be reset, and* <sup>g</sup> *is a guard.*

Given a parameter valuation ν<sup>P</sup> and PTA A, we denote by νP(A) the nonparametric structure where all occurrences of a parameter <sup>p</sup> <sup>∈</sup> <sup>P</sup> have been replaced by νP(p). Any structure νP(A) is also a *timed automaton*. By assuming a rescaling of the constants (multiplying all constants in νP(A) by their least common denominator), we obtain an equivalent (integer-valued) TA.

**Definition 2 (L/U-PTA).** *An* L/U-PTA *is a PTA where the set of parameters is partitioned into lower-bound parameters and upper-bound parameters, i. e. parameters that appear only in guards and invariants in inequalities of the form* px*, or of the form* p x*, respectively.*

**Definition 3 (Semantics of a PTA).** *Given a PTA* <sup>A</sup> = (Σ, L, 0, <sup>X</sup>, <sup>P</sup>, <sup>I</sup>, E)*, and a parameter valuation* νP*, the semantics of* νP(A) *is given by the timed transition system (TTS)* (S, s0,→)*, with:*

*–* <sup>S</sup> <sup>=</sup> {(, νX) <sup>∈</sup> <sup>L</sup> <sup>×</sup> <sup>R</sup>|X<sup>|</sup> <sup>≥</sup><sup>0</sup> <sup>|</sup> <sup>ν</sup><sup>X</sup> <sup>|</sup><sup>=</sup> <sup>ν</sup>P(I())}*,* <sup>s</sup><sup>0</sup> = (0, **<sup>0</sup>**)*,*

*–* → *consists of the discrete and (continuous) delay transition relations: (i) discrete transitions:* (, νX) <sup>e</sup> → ( , ν <sup>X</sup>)*, if* (, νX),( , ν <sup>X</sup>) ∈ S*, and there exists* e = (, g, a, R, ) ∈ E*, such that* ν <sup>X</sup> = [νX]R*, and* ν<sup>X</sup> |= νP(g)*, (ii) delay transitions:* (, νX) <sup>d</sup> <sup>→</sup> (, ν<sup>X</sup> <sup>+</sup>d)*, with* <sup>d</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>*, if* <sup>∀</sup>d <sup>∈</sup> [0, d],(, ν<sup>X</sup> <sup>+</sup>d ) ∈ S*.*

Moreover we write (, νX) (d,e) −→ ( , ν X) for a combination of a delay and discrete transition if ∃ν <sup>X</sup> : (, νX) <sup>d</sup> → (, ν <sup>X</sup>) <sup>e</sup> → ( , ν <sup>X</sup>).

Given a TA νP(A) with concrete semantics (S, s0,→), we refer to the states of S as the *concrete states* of νP(A). A *run* ρ of νP(A) is a possibly infinite alternating sequence of concrete states of νP(A), and pairs of edges and delays, starting from the initial state s<sup>0</sup> of the form s0,(d0, e0), s1, ··· , with i = 0, 1,... , and <sup>d</sup><sup>i</sup> <sup>∈</sup> <sup>R</sup>≥<sup>0</sup>, <sup>e</sup><sup>i</sup> <sup>∈</sup> <sup>E</sup>, and (si, ei, s<sup>i</sup>+1) ∈ →. The set of all finite runs over <sup>ν</sup>P(A) is denoted by *Runs*(νP(A)). The *duration* of a finite run ρ = s0,(d0, e0), s1, ··· , si, is given by *duration*(ρ) = - <sup>0</sup>≤j≤i−<sup>1</sup> <sup>d</sup><sup>j</sup> .

Given a state s = (, νX), we say that s is reachable in νP(A) if s is the last state of a run of νP(A). By extension, we say that is reachable; and by extension again, given a set T of locations, we say that T is reachable if there exists ∈ T such that is reachable in νP(A). The set of all finite runs of νP(A) that reach T is denoted by *Reach*(νP(A), T).

*Minimal reachability.* As the minimal time may not be an integer, but also the smallest value larger than an integer<sup>2</sup>, we define a minimum as either a pair in <sup>Q</sup><sup>+</sup> × {=, >} or <sup>∞</sup>. The comparison operators function as follows: (c, =) <sup>&</sup>lt; <sup>∞</sup>, (c, >) < ∞, and (c1, <sup>1</sup>) < (c2, <sup>2</sup>) iff either c<sup>1</sup> < c<sup>2</sup> or c<sup>1</sup> = c2, <sup>1</sup> is = and <sup>2</sup> is ><sup>3</sup>.

Given a set of locations T, the minimal time reachability of T in νP(A), denoted by *MinTimeReach*(νP(A), T) = min{*duration*(ρ) | ρ ∈ *Reach*(νP(A), T)}, is the minimal duration over all runs of νP(A) reaching T.

By extension, given a PTA, we denote by *MinTimePTA*(A, T) the minimal time reachability of T over all valuations, i. e. *MinTimePTA*(A, T) = min<sup>ν</sup><sup>P</sup> *MinTimeReach*(νP(A), T). As we will be interested in synthesizing the valuations leading to the minimal time, let us define *MinTimeSynth*(A, T) = {ν<sup>P</sup> | *MinTimeReach*(νP(A), T) = *MinTimePTA*(A, T)}.

We will also be interested in minimizing the valuation of a given parameter p<sup>i</sup> (without any notion of time) reaching a given location, and we therefore

<sup>2</sup> Consider a TA with a transition guarded by x > 1 from -<sup>0</sup> to -<sup>1</sup>, then the minimal duration of runs reaching -<sup>1</sup> is not 1 but slightly more.

<sup>3</sup> When we compute the minimum over a set, we actually calculate its infimum and combine the value with either = or > to indicate if the value is present in the set.

define *MinParamReach*(A, pi, T) = minν<sup>P</sup> {νP(pi) | *Reach*(νP(A), T) = ∅}. Similarly, we will be interested in synthesizing *all* valuations leading to the minimal valuation of p<sup>i</sup> reaching T, so let us define *MinParamSynth*(A, pi, T) = {ν<sup>P</sup> | *Reach*(νP(A), T) = ∅ ∧ νP(pi) = *MinParamReach*(A, pi, T)}.

#### **2.2 Computation Problems**

**Minimal-time reachability problem:** Input: A PTA A, a subset T ⊆ L of its locations. Problem: Compute *MinTimePTA*(A, T).

**Minimal-time reachability synthesis problem:** Input: A PTA A, a subset T ⊆ L of its locations. Problem: Compute *MinTimeSynth*(A, T).

Before addressing these problems, we will address the slightly different problem of minimal-parameter reachability, i. e. the minimization of a parameter reaching a given location (independently of time). We will see in Lemma 1 that this problem can also give an answer to the minimal-time reachability (synthesis) problem.

**Minimal-parameter reachability problem:** Input: A PTA A, a parameter p, a subset T ⊆ L of the locations of A. Problem: Compute *MinParamReach*(A,T,p).

**Minimal-parameter reachability synthesis problem:** Input: A PTA A, a parameter p, a subset T ⊆ L of the locations of A. Problem: Synthesize *MinParamSynth*(A,T,p).

### **2.3 Symbolic Semantics**

Let us now recall the symbolic semantics of PTAs (see e. g. [8,19]), that we will use to solve these problems.

*Constraints.* We first define operations on constraints. A linear term over <sup>X</sup>∪<sup>P</sup> is of the form - <sup>1</sup>≤i≤|X<sup>|</sup> <sup>α</sup>ixi+- <sup>1</sup>≤j≤|P<sup>|</sup> <sup>β</sup>jpj+d, with <sup>x</sup><sup>i</sup> <sup>∈</sup> <sup>X</sup>, <sup>p</sup><sup>j</sup> <sup>∈</sup> <sup>P</sup>, and <sup>α</sup>i, β<sup>j</sup> , d <sup>∈</sup> <sup>Z</sup>. A *constraint* <sup>C</sup> (i. e. a convex polyhedron) over <sup>X</sup> <sup>∪</sup> <sup>P</sup> is a conjunction of inequalities of the form *lt* 0, where *lt* is a linear term. ⊥ denotes the false parameter constraint, i. e. the constraint over P containing no valuation.

Given a parameter valuation νP, νP(C) denotes the constraint over X obtained by replacing each parameter p in C with νP(p). Likewise, given a clock valuation νX, νX(νP(C)) denotes the expression obtained by replacing each clock x in νP(C) with νX(x). We say that ν<sup>P</sup> *satisfies* C, denoted by ν<sup>P</sup> |= C, if the set of clock valuations satisfying νP(C) is non-empty. Given a parameter valuation ν<sup>P</sup> and a clock valuation <sup>ν</sup>X, we denote by <sup>ν</sup><sup>X</sup>|ν<sup>P</sup> the valuation over <sup>X</sup> <sup>∪</sup> <sup>P</sup> such that for all clocks x, ν<sup>X</sup>|νP(x) = νX(x) and for all parameters p, ν<sup>X</sup>|νP(p) = νP(p). We use the notation νX|ν<sup>P</sup> |= C to indicate that νX(νP(C)) evaluates to true. We say that C is *satisfiable* if ∃νX, ν<sup>P</sup> s.t.νX|ν<sup>P</sup> |= C.

We define the *time elapsing* of C, denoted by C, as the constraint over X and P obtained from C by delaying all clocks by an arbitrary amount of time. That is, ν <sup>X</sup>|ν<sup>P</sup> <sup>|</sup><sup>=</sup> <sup>C</sup> iff <sup>∃</sup>ν<sup>X</sup> : <sup>X</sup> <sup>→</sup> <sup>R</sup>+, <sup>∃</sup><sup>d</sup> <sup>∈</sup> <sup>R</sup><sup>+</sup> s.t. <sup>ν</sup> <sup>X</sup>|ν<sup>P</sup> |= C ∧ ν <sup>X</sup> = ν<sup>X</sup> + d. Given <sup>R</sup> <sup>⊆</sup> <sup>X</sup>, we define the *reset* of <sup>C</sup>, denoted by [C]R, as the constraint obtained from C by resetting the clocks in R, and keeping the other clocks unchanged. Given a subset <sup>P</sup> <sup>⊆</sup> <sup>P</sup> of parameters, we denote by <sup>C</sup>↓P the projection of C onto P , i. e. obtained by eliminating the clock variables and the parameters in <sup>P</sup> \ <sup>P</sup> (e. g. using Fourier-Motzkin). Therefore, <sup>C</sup>↓<sup>P</sup> denotes the elimination of the clock variables only, i. e. the projection onto P. Given p, we denote by GetMin(C, p) the minimum of p in a form (c, ). Technically, GetMin can be implemented using polyhedral operations as follows: C↓{p} is computed, and then the infimum is extracted; then the operator in {=, >} is inferred depending whether C↓{p} is bounded from below using a closed or an open constraint. We extend GetMin to accommodate clocks, thus GetMin(C, x) returns the minimal clock value that x can take, while conforming to C.

A symbolic state is a pair (, C) where ∈ L is a location, and C its associated constraint, called *parametric zone*.

**Definition 4 (Symbolic semantics).** *Given a PTA* <sup>A</sup> = (Σ, L, 0, <sup>X</sup>, <sup>P</sup>, <sup>I</sup>, E)*, the symbolic semantics of* A *is defined by the labelled transition system called the* parametric zone graph PZG = (E, **S**, **s**0,⇒)*, with*

*–* **S** = {(, C) | C ⊆ I()}*,* **s**<sup>0</sup> = 0,( <sup>1</sup>≤i≤|X<sup>|</sup> <sup>x</sup><sup>i</sup> = 0) ∧ I(0) *, and –* (, C), e,( , C ) ∈ ⇒ *if* e = (, g, a, R, ) ∈ E *and* C = [(C ∧ g)]<sup>R</sup> ∧ I( ) ∧ I( ) *with* C *satisfiable.*

That is, in the parametric zone graph, nodes are symbolic states, and arcs are labeled by *edges* of the original PTA. Given **s** = (, C), if (, C), e,( , C ) ∈ ⇒, we write Succ(**s**, e)=( , C ). By extension, we write Succ(**s**) for ∪<sup>e</sup>∈<sup>E</sup>Succ(**s**, e). Well-known results (see [19]) connect the concrete and the symbolic semantics.

#### **3 Computability and Intractability**

#### **3.1 Minimal-Time Reachability**

The following result is a consequence of a monotonicity property of L/U-PTAs [19]. We can safely replace parameters with some constants in order to compute the solution to the minimal-time reachability problem, which reduces to the minimal-time reachability in a TA, which is PSPACE-complete [17]. All proofs are given in [7].

**Proposition 1 (minimal-time reachability for L/U-PTAs).** *The minimal-time reachability problem for L/U-PTAs is PSPACE-complete.*

Computing the minimal time for which a location is reached (Proposition 1) does not mean that we are able to compute exactly all valuations for which this location is reachable in minimal time. In fact, we show that it is not possible in a formalism for which the emptiness of the intersection is decidable—which notably rules out its representation as a finite union of polyhedra. The proof idea is that representing it in such a formalism would contradict the undecidability of the emptiness problem for (normal) PTAs.

**Proposition 2 (intractability of minimal-time reachability synthesis for L/U-PTAs).** *The solution to the minimal-time reachability synthesis problem for L/U-PTAs cannot be represented in a formalism for which the emptiness of the intersection is decidable.*

### **3.2 Minimal-Parameter Reachability**

For the full class of PTAs, we will see that these problems are clearly out of reach: if it was possible to compute the solution to the minimal-parameter reachability or minimal-parameter reachability synthesis, then it would be possible to answer the reachability emptiness problem—which is undecidable in most settings [6].

We first show that an algorithm for the minimal-parameter synthesis problem can be used to solve the minimal-time synthesis problem, i. e. the minimalparameter synthesis problem is at least as hard as the minimal-time synthesis problem.

**Lemma 1 (minimal-time from minimal-parameter synthesis).** *An algorithm that solves the minimal-parameter synthesis problem can be used to solve the minimal-time synthesis problem by extending the PTA.*

*Proof.* Assume we are given an arbitrary PTA A, a set of target locations T, and a global clock x*global* that never resets. We construct the PTA A from A by adding a new parameter p*global* , and for every edge (, g, a, R, ) in A such that ∈ T, we replace g by g∧x*global* = p*global* . Note that when a target location from T is reached, we have that x*global* = p*global* , hence by minimizing p*global* we also minimize x*global* . Thus, by solving *MinParamSynth*(A ,T,p*global*), we effectively solve *MinTimeSynth*(A, T).

The following result states that synthesis of the minimal-value of the parameter is intractable for PTAs.

**Proposition 3 (intractability of minimal-parameter reachability for PTAs).** *The solution to the minimal-parameter reachability for PTAs cannot be computed in general.*

*Proof (sketch).* By showing that testing equality of "p = 0" against the solution of the minimal-parameter reachability problem for the PTA in Fig. 2 and <sup>f</sup> is equivalent to solving reachability emptiness of <sup>f</sup> in A—which is undecidable [3]. Therefore, the solution cannot be computed in general.

The intractability of minimal-parameter reachability synthesis for PTAs will be implied by the upcoming Proposition 4 in a more restricted setting.

**Fig. 2.** Intractability of minimal-parameter reachability for PTAs

*Intractability of the synthesis for L/U-PTAs.* The following result states that synthesis is intractable for L/U-PTAs. In particular, this rules out the possibility to represent the result using a finite union of polyhedra.

**Proposition 4 (intractability of minimal-parameter reachability synthesis for L/U-PTAs).** *The solution to the minimal-parameter reachability synthesis for L/U-PTAs cannot always be represented in a formalism for which the emptiness of the intersection is decidable and for which the minimization of a variable is computable.*

*Proof.* From Lemma 1 and Proposition 2.

The minimal-parameter reachability problem remains open for L/U-PTAs (see Sect. 7). Despite these negative results, we will define procedures that address not only the class of L/U-PTAs, but in fact the class of full PTAs. Of course, these procedures are not guaranteed to terminate.

#### **4 Minimal Parameter Reachability Synthesis**

We give MinParamSynth(A,T,p) in Algorithm 1. It maintains a set **W** of waiting symbolic states, a set **P** of passed states, a current optimum *Opt* and the associated optimal valuations K. While **W** is not empty, a state is picked in line 6. If it is a target state (i. e. ∈ T) then the projection of its constraint onto p is computed, and the minimum is inferred (line 10). If that projection improves the known optimum, then the associated parameter valuations K are completely replaced by the one obtained from the current state (i. e. the projection of <sup>C</sup> onto <sup>P</sup>). Otherwise, if <sup>C</sup>↓{p} is equal to the known optimum (line 14), then we add (using disjunction) the associated valuations. Finally, if the current state is not a target state and has not been visited before, then we compute its successors and add them to **W** in lines 17 and 18.

Note that if **W** is implemented as a FIFO list with "pick" the first element, then this algorithm is a classical BFS procedure.

Also note that if we replace lines 10–15 with the statement K ← K ∨ C↓<sup>P</sup> (i. e. adding the parameter valuations to K every time the algorithm reaches a target location), we obtain the standard synthesis algorithm EFSynth from e. g. [20], that synthesizes all parameter valuations for which a set of locations is reachable.

### **Algorithm 1:** MinParamSynth(A,T,p)

**input** : A PTA <sup>A</sup> with symbolic initial state **<sup>s</sup>**<sup>0</sup> = (0, C0), a set of target locations <sup>T</sup>, a parameter p. **output** : Constraint K over the parameters. **<sup>W</sup>** ← {**s**0} // waiting set **<sup>P</sup>** ← ∅ // passed set *Opt* ← ∞ // current optimum <sup>K</sup> ← ⊥ // current optimum valuations **while W** <sup>=</sup> <sup>∅</sup> **do** Pick **s** = (, C) from **W <sup>W</sup>** <sup>←</sup> **<sup>W</sup>** \ {**s**} **<sup>P</sup>** <sup>←</sup> **<sup>P</sup>** ∪ {**s**} **if** <sup>∈</sup> <sup>T</sup> **then** // **<sup>s</sup>** is a target state **<sup>s</sup>***opt* <sup>←</sup> GetMin(C, p) // compute local optimum **if s***opt* < *Opt* **then** // the optimum is strictly better *Opt* <sup>←</sup> **<sup>s</sup>***opt* // we found a new best optimum: replace it <sup>K</sup> <sup>←</sup> <sup>C</sup>↓<sup>P</sup> // completely replace the found valuations **else if s***opt* = *Opt* **then** // the optimum is equal to the one known <sup>K</sup> <sup>←</sup> <sup>K</sup> <sup>∨</sup> <sup>C</sup>↓<sup>P</sup> // add the found valuations **else** // otherwise explore successors **for each s**- <sup>∈</sup> Succ(**s**) **do <sup>18</sup> if s**- <sup>∈</sup>/ **<sup>W</sup>** <sup>∧</sup> **<sup>s</sup>**- <sup>∈</sup>/ **P then W** <sup>←</sup> **<sup>W</sup>** ∪ {**s**- } **<sup>19</sup> return** K

**Fig. 3.** PTA exemplifying Algorithm 1.

*Example 2.* Consider the PTA A in Fig. 3, and run MinParamSynth(A, {3}, p1). The initial state is **s**<sup>1</sup> = (1, x ≥ 0) (we omit the trivial constraints p<sup>i</sup> ≥ 0). Its successors **s**<sup>2</sup> = (3, x ≥ 2∧p<sup>1</sup> > 2) and **s**<sup>3</sup> = (2, x ≥ 0∧p<sup>2</sup> > 1) are added to **W**. Pick **s**<sup>2</sup> from **W**: it is a target, and therefore GetMin(C2, p1) is computed, which gives (2, >). Since (2, >) < ∞, we found a new minimum, and K becomes C2↓<sup>P</sup>, i. e. p<sup>1</sup> > 2. Pick **s**<sup>3</sup> from **W**: it is not a target, therefore we compute its successors **s**<sup>4</sup> = (3, x ≥ 2∧p<sup>1</sup> = 2∧1 < p<sup>2</sup> < 2) and **s**<sup>5</sup> = (3, x ≥ 2∧p<sup>1</sup> = p<sup>3</sup> = 2∧p<sup>2</sup> > 1). Pick **s**4: it is a target, with GetMin(C4, p1) = (2, =). As (2, =) < (2, >), we found a new minimum, and K is replaced with C4↓<sup>P</sup>, i. e. p<sup>1</sup> = 2 ∧ 1 < p<sup>2</sup> < 2. Pick **s**5: it is a target, with GetMin(C4, p1) = (2, =). As (2, =) = (2, =), we found an equally good minimum, and K is improved with C5↓<sup>P</sup>, giving a new K equal to (p<sup>1</sup> = 2 ∧ 1 < p<sup>2</sup> < 2) ∨ (p<sup>1</sup> = p<sup>3</sup> = 2 ∧ p<sup>2</sup> > 1). As **W** = ∅, K is returned.

Algorithm 1 is a semi-algorithm; if it terminates with result K, then K is a solution for the MinParamSynth problem. Correctness follows from the fact that the algorithm explores the entire parametric zone graph, except for successors of target states (from [19,20] we have that successors of a symbolic state can only restrict the parameter constraint, hence we cannot improve). Furthermore, the minimum is tracked and updated whenever a target state is reached.

We show that synthesis can effectively be achieved for PTAs with a single clock, a decidable subclass.

**Proposition 5 (synthesis for one-clock PTAs).** *The solution to the minimal-parameter reachability synthesis can be computed for 1-clock PTAs using a finite union of polyhedra.*

### **5 Minimal Time Reachability Synthesis**

For minimal-time reachability and synthesis, we assume that the PTA contains a global clock x*global* that is never reset. Otherwise, we extend the PTA by simply adding a 'dummy' clock x*global* without any associated guards, invariants or resets.

```
Algorithm 2: MinTimeSynth(A,T,xglobal)
  input : A PTA A with symbolic initial state s0 = (0, C0), a set of target locations T,
         a global clock that never resets xglobal.
  output : Minimal time Topt constraint K over the parameters.
1 Q ← {(0, s0)} // priority queue ordered by time
2 P ← ∅ // passed set
3 K ← ⊥ // current optimum parameter valuations
4 Topt ← ∞ // current optimum time
5 while Q = ∅ do
6 (t, s = (, C)) = Q.Pop() // take head of the queue and remove it
7 P ← P ∪ {s}
8 if t>Topt then break
9 else if  ∈ T then // when s is a target state and t ≤ Topt
10 K ← K ∨ (C ∧ xglobal = t)↓P // valuations for which t = Topt
11 else // otherwise explore successors
12 for each s-
               ∈ Succ(s) do
13 if s-
             ∈ Q ∨ s-
                   ∈ P then continue // ignore seen states
14 t
          -
            ← GetMin(s-

                   .C, xglobal ) // get minimal time of s-

                                                          .C
15 if t-
             ≤ Topt then // only add states not exceeding Topt
16 if s-

               . ∈ T ∧ t
                     -
                      < Topt then
17 Topt ← t
                     -
                      // new lower time to target
18 Q.Push((t
                   -

                   , s-

                     )) // add to the priority queue
19 return (Topt, K)
```
We give MinTimeSynth(A,T,x*global*) in Algorithm 2. We maintain a *priority queue* **Q** of waiting symbolic states and order these by their minimal time (for the initial state this is 0). We further maintain a set **P** of passed states, a current time optimum T*opt* (initially ∞), and the associated optimal valuations K. We first explain the synthesis algorithm and then the reachability variant.

*Minimal-time reachability synthesis.* While **Q** is not empty, the state with the lowest associated minimal time t is popped from the head of the queue (line 6). If this time t is larger than T*opt* (line 8), then this also holds for all remaining states in **Q**. Also all successor states from **s** (or successors of any state from **Q**) cannot have a better minimal time, thus we can end the algorithm.

Otherwise, if **s** is a target state, we assume that t ≮ T*opt* and thus t = T*opt* (we guarantee this property when pushing states to the queue). Before adding the parameter valuations to K in line 10, we intersect the constraint with x*global* = t in case the clock value depends on parameters, e. g. if C is x*global* = p. 4

If **s** is not a target state, then we consider its successors in lines 12–18. We ignore states that have been visited before (line 13), and compute the minimal time of **s** in line 14. We compare t with T*opt* in line 15. All successor states for which t exceeds T*opt* are ignored, as they cannot improve the result.

If **s** is a target state and t < T*opt*, then we update T*opt*. Finally, the successor state is pushed to the priority queue in line 18. Note that we preserve the property that t ≮ T*opt* for the states in **Q**.

*Minimal-time reachability.* When we are interested in just a single parameter valuation, we may end the algorithm early. The algorithm can be terminated as soon as it reaches line 10. We can assert at this point that T*opt* will not decrease any further, since all remaining unexplored states have a minimal time that is larger than or equal to T*opt*.

Algorithm 2 is a semi-algorithm; if it terminates with result (T*opt*, K), then K is a solution for the MinTimeSynth problem. Correctness follows from the fact that the algorithm explores exactly all symbolic states in the parametric zone graph that can be reached in at most T*opt* time, except for successors of target states. Note (again) that successors of a symbolic state can only restrict the parameter constraint. Furthermore, T*opt* is checked and updated for every encountered successor to ensure that the first time a target state is popped from the priority queue **Q**, it is reached in T*opt* time (after which T*opt* never changes).

### **6 Experiments**

We implemented all our algorithms in the IMITATOR tool [9] and compared their performance with the standard (non-minimization) EFSynth parameter synthesis algorithm from [20]. For the experiments, we are interested in analysing the performance (in the form of computation time) of each algorithm, and comparing that with the performance of standard synthesis.

*Benchmark models.* We collected PTA models and properties from the IMITA-TOR benchmarks library [5] which contains numerous benchmark models from

<sup>4</sup> In case <sup>t</sup> is of the form (c, >) with <sup>c</sup> <sup>∈</sup> <sup>Q</sup>+, then the intersection of <sup>C</sup> with the linear term x*global* = t would result in ⊥, as the exact value t is not part of the constraint. In the implementation, we intersect C with x*global* = t + ε, for a small ε > 0.

scientific and industrial domains. We selected all models with reachability properties and extended these to include: (1) a new clock variable that represents the global time x*global* , i. e. a clock that does not reset, and (2) a new parameter p*global* along with the linear term x*global* = p*global* for every transition that targets a goal location, to ensure that when minimizing p*global* we effectively minimize x*global* . In total we have 68 models, and for every experiment we used the extended model that includes both the global time clock x*global* and the corresponding parameter p*global* .

*Subsumption.* For each algorithm that we consider, it is possible to reduce the search space with the following two reduction techniques:


State inclusion is a relatively inexpensive computational task and preliminary results showed that it caused the algorithm to perform equally fast or faster than without the check. Checking for merging is however a computationally expensive procedure and thus should not be performed for every newly found state. For all BFS-based algorithms (standard synthesis and minimal-parameter synthesis) we merge every BFS layer. For the minimal-time synthesis algorithm, we empirically studied various merging heuristics and found that merging every ten iterations of the algorithm yielded the best results. We assume that both the inclusion and merging state-space reductions are used in all experiments (all computation times include the overhead the reductions), unless otherwise mentioned.

*Run configurations.* For the experiments we used the following configurations:


*Experimental setup.* We performed all our experiments on an Intel-<sup>R</sup> Coretm i7- 4710MQ processor with 2.50 GHz and 7.4GiB memory, using a single thread. The six run configurations were executed on each benchmark model, with a timeout of 3600 s. All our models, results, and information on how to reproduce the results are available on https://github.com/utwente-fmt/OptTime-TACAS19.

**Results.** The results of our experiments are displayed in Fig. 4.

MTSynth vs EFSynth. We observe that for most of the models MTSynth clearly outperforms EFSynth. This is to be expected since all states that take more than the minimal time can be ignored. Note that the experiments that appear on a vertical line between 0.1s<x< 1s are a scaled-up variant of the same model, indicating that this scaling does not affect minimal-time synthesis. Finally, the model plotted at (1346, 52) does not heavily modify the clocks. As a consequence, MTSynth has to explore most of the state space while continuously having to extract the time constraints, making it inefficient.

**Fig. 4.** Scatterplot comparisons of different algorithm configurations. The marks on the red dashed line did not finish computing within the allowed time (3600 s). (Color figure online)

MPSynth vs EFSynth. We can see that MPSynth performs more similar to EFSynth than MTSynth, which is to be expected as the algorithms differ less. Still, MPSynth significantly outperforms EFSynth. This is also because fewer states have to be explored to guarantee optimality (once a parameter exceeds the minimal value, all its successors can be ignored).

MTSynth vs MPSynth. Here, we find that MTSynth outperforms MPSynth, similar to the comparison with EFSynth. The results also show a second scalable model around (0.003, 10) and we see that MPSynth is able to solve the 'bad performing model' for MTSynth as quickly as EFSynth. Still, we can conclude that the minimal-time synthesis problem is in general more efficiently solved with the MTSynth algorithm.

MTSynth vs MTSynth-noRed. Here we can see the advantage of using the inclusion and merging reductions to reduce the search space. For most models there is a non-existent to slight improvement, but for others it makes a large difference. While there is some computational overhead in performing these reductions, this overhead is not significant enough to outweigh their benefits.

MTReach vs MTSynth. With MTReach we expect faster execution times as the algorithm terminates once a parameter valuation is found. The experiments show that this is indeed the case (mostly visible from the timeout line). However, we also observe that for quite a few models the difference is not as significant, implying that synthesis results can often be quickly obtained once a single minimal-time valuation is found.

MPReach vs MPSynth. Here we also expect MPReach to be faster than its synthesis variant. While it does quickly solve six instances for which MPSynth timed out, other than that there is no real performance gain. We also argue here that synthesis is obtained quickly when a minimal parameter bound is found. Of course we are effectively computing a minimal global time, so results may change when a different parameter is minimized.

### **7 Conclusion**

We have designed and implemented several algorithms to solve the minimal-time parameter synthesis and related problems for PTAs. From our experiments we observed in general that minimal-time reachability synthesis is in fact faster to compute compared to standard synthesis. We further show that synthesis while minimizing a parameter is also more efficient, and that existing search space reductions apply well to our algorithms.

Aside from the performance improvement, we deem minimal-time reachability synthesis to be useful in practice. It allows for evaluating which parameter valuations guarantee that the goal is reached in minimal time. We consider it particularly valuable when reasoning about real-time systems.

On the theoretical side, we did not address the minimal-parameter reachability problem for L/U-PTAs (we only showed intractability of the synthesis). While finding the minimal valuation of a given lower-bound parameter is trivial (the answer is 0 iff the target location is reachable), finding the minimum of an upper-bound parameter boils down to reachability-synthesis for U-PTAs, a problem that remains open in general (it is only solvable for integer-valued parameters [15]), as well as to shrinking timed automata [23], but with 0-coefficients in the shrinking vector—not allowed in [23].

A direction for future work is to improve performance by exploiting parallelism. Parallel random search could significantly speed up the computation process, as demonstrated for timed automata [24,25]. Another interesting research direction is to look at maximizing the time to reach the target, or to minimize the *upper-bound* time to reach the target (e. g. for minimizing the worst-case response-time in real-time systems); a preliminary study suggests that the latter problem is significantly more complex than the minimal-time synthesis problem. One may also study other quantitative criteria, e. g. minimizing cost parameters.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Environmentally-Friendly GR(1) Synthesis**

Rupak Majumdar<sup>1</sup>, Nir Piterman<sup>2</sup>, and Anne-Kathrin Schmuck1(B)

<sup>1</sup> MPI-SWS, Kaiserslautern, Germany akschmuck@mpi-sws.org <sup>2</sup> University of Leicester, Leicester, UK

**Abstract.** Many problems in reactive synthesis are stated using two formulas—an *environment assumption* and a *system guarantee*—and ask for an implementation that satisfies the guarantee in environments that satisfy their assumption. Reactive synthesis tools often produce strategies that formally satisfy such specifications by actively preventing an environment assumption from holding. While formally correct, such strategies do not capture the intention of the designer. We introduce an additional requirement in reactive synthesis, *non-conflictingness*, which asks that a system strategy should always allow the environment to fulfill its liveness requirements. We give an algorithm for solving GR(1) synthesis that produces non-conflicting strategies. Our algorithm is given by a 4-nested fixed point in the *µ*-calculus, in contrast to the usual 3-nested fixed point for GR(1). Our algorithm ensures that, in every environment that satisfies its assumptions on its own, traces of the resulting implementation satisfy both the assumptions and the guarantees. In addition, the asymptotic complexity of our algorithm is the same as that of the usual GR(1) solution. We have implemented our algorithm and show how its performance compares to the usual GR(1) synthesis algorithm.

### **1 Introduction**

Reactive synthesis from temporal logic specifications provides a methodology to automatically construct a system implementation from a declarative specification of correctness. Typically, reactive synthesis starts with a set of requirements on the system and a set of assumptions about the environment. The objective of the synthesis tool is to construct an implementation that ensures all guarantees are met in every environment that satisfies all the assumptions; formally, the synthesis objective is an implication <sup>A</sup> <sup>⇒</sup> <sup>G</sup>. In many synthesis problems, the system can actively influence whether an environment satisfies its assumptions. In such cases, an implementation that prevents the environment from satisfying its assumptions is considered correct for the specification: since the antecedent of the implication <sup>A</sup> <sup>⇒</sup> <sup>G</sup> does not hold, the property is satisfied.

N. Piterman—Supported by project "d-SynMA" that is funded by the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 772459).

T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 229–246, 2019. https://doi.org/10.1007/978-3-030-17465-1\_13

**Fig. 1.** Pictorial representation of a *desired* strategy for a robot (square) moving in a maze in presence of a moving obstacle (circle). Obstacle and robot start in the lower left and right corner, can move at most one step at a time (to non-occupied cells) and cells that they should visit infinitely often are indicated in light and dark gray (see *q*0), respectively. Nodes with self-loops (*q*{1*,*3*,*6*,*8}) can be repeated finitely often with the obstacle located at one of the dotted positions.

Such implementations satisfy the letter of the specification but not its intent. Moreover, assumption-violating implementations are not a theoretical curiosity but are regularly produced by synthesis tools such as slugs [14]. In recent years, a lot of research has thus focused on how to model environment assumptions [2, 4,5,11,18], so that assumption-violating implementations are ruled out. Existing research either removes the "zero sum" assumption on the game by introducing different levels of co-operation [5], by introducing equilibrium notions inspired by non-zero sum games [7,16,20], or by introducing richer quantitative objectives on top of the temporal specifications [1,3].

**Contribution.** In this paper, we take an alternative approach. We consider the setting of GR(1) specifications, where assumptions and guarantees are both conjunctions of safety and B¨uchi properties [6]. GR(1) has emerged as an expressive specification formalism [17,24,28] and, unlike full linear temporal logic, synthesis for GR(1) can be implemented in time quadratic in the state/transition space. In our approach, the environment is assumed to satisfy its assumptions provided the system does not prevent this. Conversely, the system is required to pick a strategy that ensures the guarantees whenever the assumptions are satisfied, but additionally ensures *non-conflictingness*: along each finite prefix of a play according to the strategy, there exists the persistent possibility for the environment to play such that its liveness assumptions will be met.

Our main contribution is to show a μ-calculus characterization of winning states (and winning strategies) that rules out system strategies that are winning by preventing the environment from fulfilling its assumptions. Specifically, we provide a 4-nested fixed point that characterizes winning states and strategies that are *non-conflicting* and ensure all guarantees are met if all the assumptions are satisfied. Thus, if the environment promises to satisfy its assumption if allowed, the resulting strategy ensures both the assumption and the guarantee.

Our algorithm does not introduce new notions of winning, or new logics or winning conditions. Moreover, since μ-calculus formulas with d alternations can be computed in O(nd/2) time [8,26], the O(n<sup>2</sup>) asymptotic complexity for the new symbolic algorithm is the same as the standard GR(1) algorithm.

**Motivating Example.** Consider a small two-dimensional maze with 3 × 2 cells as depicted in Fig. 1, state q0. A robot (square) and an obstacle (circle) are

**Fig. 2.** Pictorial representation of the *GR(1) winning strategy* synthesized by slugs for the robot (square) in the game described in Fig. 1.

located in this maze and can move at most one step at a time to non-occupied cells. There is a wall between the lower and upper left cell and the lower and upper right cell. The interaction between the robot and the object is as follows: first the environment chooses where to move the obstacle to, and, after observing the new location of the obstacle, the robot chooses where to move.

Our objective is to synthesize a strategy for the robot s.t. it visits both the upper left and the lower right corner of the maze (indicated in dark gray in Fig. 1, state q0) infinitely often. Due to the walls in the maze the robot needs to cross the two white middle cells infinitely often to fulfill this task. If we assume an arbitrary, adversarial behavior of the environment (e.g., placing the obstacle in one white cell and never moving it again) this desired robot behavior cannot be enforced. We therefore assume that the obstacle is actually another robot that is required to visit the lower left and the upper right corner of the maze (indicated in light gray in Fig. 1, state q0) infinitely often. While we do not know the precise strategy of the other robot (i.e., the obstacle), its liveness assumption is enough to infer that the obstacle will always eventually free the white cells. Under this assumption the considered synthesis problem has a solution.

Let us first discuss one intuitive strategy for the robot in this scenario, as depicted in Fig. 1. We start in q<sup>0</sup> with the obstacle (circle) located in the lower left corner and the robot (square) located in the lower right corner. Recall that the obstacle will eventually move towards the upper right corner. The robot can therefore wait until it does so, indicated by q1. Here, the dotted circles denote possible locations of the obstacle during the (finitely many) repetitions of q<sup>1</sup> by following its self loop. Whenever the obstacle moves to the upper part of the maze, the robot moves into the middle part (q2). Now it waits until the obstacle reaches its goal in the upper right, which is ensured to happen after a finite number of visits to q3. When the obstacle reaches the upper right, the robot moves up as well (q4). Now the robot can freely move to its goal in the upper left (q5). This process symmetrically repeats for moving back to the respective goals in the lower part of the maze (q<sup>6</sup> to q<sup>9</sup> and then back to q0). With this strategy, the interaction between environment and system goes on for infinitely many cycles and the robot fulfills its specification.

The outlined synthesis problem can be formalized as a two player game with GR(1) winning condition. When solving this synthesis problem using the tool slugs [14], we obtain the strategy depicted in Fig. 2 (not the desired one in Fig. 1). The initial state, denoted by q<sup>0</sup> is the same as in Fig. 1 and if the environment moves the obstacle into the middle passage (q1) the robot reacts as before; it waits until the object eventually proceeds to the upper part of the maze (q2). However, after this happens the robot takes the chance to simply move to the lower left cell of the maze and stays there forever (q3). By this, the robot prevents the environment from fulfilling its objective. Similarly, if the obstacle does not immediately start moving in q0, the robot takes the chance to place itself in the middle passage and stays there forever (q4). This obviously prevents the environment from fulfilling its liveness properties.

In contrast, when using our new algorithm to solve the given synthesis problem, we obtain the strategy given in Fig. 1, which satisfies the guarantees while allowing the environment assumptions to be satisfied.

**Related Work.** Our algorithm is inspired by supervisory controller synthesis for non-terminating processes [23,27], resulting in a fixed-point algorithm over a Rabin-B¨uchi automaton. This algorithm has been simplified for two interacting B¨uchi automata in [22] without proof. We adapt this algorithm to GR(1) games and provide a new, self-contained proof in the framework of two-player games, which is distinct from the supervisory controller synthesis setting (see [13,25] for a recent comparison of both frameworks).

The problem of correctly handling assumptions in synthesis has recently gained attention in the reactive synthesis community [4]. As our work does not assume precise knowledge about the environment strategy (or the ability to impose the latter), it is distinct from cooperative approaches such as assumeguarantee [9] or rational synthesis [16]. It is closest related to obliging games [10], cooperative reactive synthesis [5], and assume-admissible synthesis [7]. Obliging games [10] incorporate a similar notion of non-conflictingness as our work, but do not condition winning of the system on the environment fulfilling the assumptions. This makes obliging games harder to win. Cooperative reactive synthesis [5] tries to find a winning strategy enforcing <sup>A</sup> <sup>∩</sup> <sup>G</sup>. If this specification is not realizable, it is relaxed and the obtained system strategy enforces the guarantees if the environment cooperates "in the right way". Instead, our work always assumes the same form of cooperation; coinciding with just one cooperation lever in [5]. Assume-admissible synthesis [7] for two players results in two individual synthesis problems. Given that both have a solution, only implementing the system strategy ensures that the game will be won if the environment plays *admissible*. This is comparable to the view taken in this paper, however, assuming that the environment plays *admissible* is stronger then our assumption on an environment attaining its liveness properties if not prevented from doing so. Moreover, we only need so solve one synthesis problem, instead of two. However, it should be noted that [5,7,10] handle ω-regular assumptions and guarantees. We focus on the practically important GR(1) fragment and our method better leverages the computational benefits for this fragment.

All proofs of our results and additional examples can be found in the extended version [21]. We further acknowledge that the same problem was independently solved in the context of reactive robot mission plans [12] which was brought to our attention only shortly before the final submission of this paper.

### **2 Two Player Games and the Synthesis Problem**

#### **2.1 Two Player Games**

**Formal Languages.** Let Σ be a finite alphabet. We write Σ∗, Σ<sup>+</sup>, and Σ<sup>ω</sup> for the sets of finite words, non-empty finite words, and infinite words over Σ. We write <sup>w</sup> <sup>≤</sup> <sup>v</sup> (resp., w<v) if <sup>w</sup> is a prefix of <sup>v</sup> (resp., a strict prefix of <sup>v</sup>). The set of all prefixes of a word <sup>w</sup> <sup>∈</sup> <sup>Σ</sup><sup>ω</sup> is denoted pfx(w) <sup>⊆</sup> <sup>Σ</sup>∗. For <sup>L</sup> <sup>⊆</sup> <sup>Σ</sup>∗, we have <sup>L</sup> <sup>⊆</sup> pfx(L). For L ⊆ <sup>Σ</sup><sup>ω</sup> we denote by <sup>L</sup> its complement <sup>Σ</sup><sup>ω</sup> \ L.

**Game Graphs and Strategies.** A *two player game graph* H = (Q0, Q1, δ0, δ1, q0) consists of two finite disjoint state sets Q<sup>0</sup> and Q<sup>1</sup>, two transition functions <sup>δ</sup><sup>0</sup> : <sup>Q</sup><sup>0</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup><sup>1</sup> and <sup>δ</sup><sup>1</sup> : <sup>Q</sup><sup>1</sup> <sup>→</sup> <sup>2</sup><sup>Q</sup><sup>0</sup> , and an initial state <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup><sup>0</sup>. We write <sup>Q</sup> <sup>=</sup> <sup>Q</sup><sup>0</sup> <sup>∪</sup> <sup>Q</sup><sup>1</sup>. Given a game graph <sup>H</sup>, a *strategy* for player 0 is a function <sup>f</sup> <sup>0</sup> : (Q0Q<sup>1</sup>)∗Q<sup>0</sup> <sup>→</sup> <sup>Q</sup><sup>1</sup>; it is *memoryless* if <sup>f</sup> <sup>0</sup>(νq<sup>0</sup>) = <sup>f</sup> <sup>1</sup>(q<sup>0</sup>) for all <sup>ν</sup> <sup>∈</sup> (Q0Q<sup>1</sup>)<sup>∗</sup> and all <sup>q</sup><sup>0</sup> <sup>∈</sup> <sup>Q</sup><sup>0</sup>. A *strategy* <sup>f</sup> <sup>1</sup> : (Q0Q<sup>1</sup>)<sup>+</sup> <sup>→</sup> <sup>Q</sup><sup>0</sup> for player 1 is defined analogously. The infinite sequence <sup>π</sup> <sup>∈</sup> (Q0Q<sup>1</sup>)<sup>ω</sup> is called a play over <sup>H</sup> if <sup>π</sup>(0) = <sup>q</sup><sup>0</sup> and for all <sup>k</sup> <sup>∈</sup> <sup>N</sup> holds that <sup>π</sup>(2<sup>k</sup> + 1) <sup>∈</sup> <sup>δ</sup><sup>0</sup>(π(2k)) and <sup>π</sup>(2<sup>k</sup> + 2) <sup>∈</sup> <sup>δ</sup><sup>1</sup>(π(2<sup>k</sup> + 1)); <sup>π</sup> is compliant with <sup>f</sup> <sup>0</sup> and/or <sup>f</sup> <sup>1</sup> if additionally holds that <sup>f</sup> <sup>0</sup>(π|[0,2k]) = <sup>π</sup>(2<sup>k</sup> + 1) and/or <sup>f</sup> <sup>1</sup>(π|[0,2k+1]) = <sup>π</sup>(2<sup>k</sup> + 2). We denote by <sup>L</sup>(H, f <sup>0</sup>), <sup>L</sup>(H, f <sup>1</sup>) and <sup>L</sup>(H, f <sup>0</sup>, f <sup>1</sup>) the set of plays over <sup>H</sup> compliant with <sup>f</sup> <sup>0</sup>, f <sup>1</sup>, and both f <sup>0</sup> and f <sup>1</sup>, respectively.

**Winning Conditions.** We consider winning conditions defined over sets of states of a given game graph <sup>H</sup>. Given <sup>F</sup> <sup>⊆</sup> <sup>Q</sup>, we say a play <sup>π</sup> satisfies the *B¨uchi condition* <sup>F</sup> if Inf(π)∩<sup>F</sup> <sup>=</sup> <sup>∅</sup>, where Inf(π) = {<sup>q</sup> <sup>∈</sup> <sup>Q</sup> <sup>|</sup> <sup>π</sup>(k) = <sup>q</sup> for infinitely many <sup>k</sup> <sup>∈</sup> <sup>N</sup>}. Given a set <sup>F</sup> <sup>=</sup> {F1, . . ., F<sup>m</sup>}, where each <sup>F</sup><sup>i</sup> <sup>⊆</sup> <sup>Q</sup>, we say a play <sup>π</sup> satisfies the *generalized B¨uchi condition* <sup>F</sup> if Inf(π)∩F<sup>i</sup> <sup>=</sup> <sup>∅</sup> for each <sup>i</sup> <sup>∈</sup> [1; <sup>m</sup>]. We additionally consider generalized reactivity winning conditions with rank 1 (GR(1) winning conditions in short). Given two generalized B¨uchi conditions <sup>F</sup><sup>0</sup> <sup>=</sup> {F<sup>0</sup> <sup>1</sup> , . . ., F<sup>0</sup> <sup>m</sup>} and <sup>F</sup><sup>1</sup> <sup>=</sup> {F<sup>1</sup> <sup>1</sup> , . . ., F<sup>1</sup> <sup>n</sup>}, a play <sup>π</sup> satisfies the GR(1) condition if either Inf(π) <sup>∩</sup>F<sup>0</sup> <sup>i</sup> <sup>=</sup> <sup>∅</sup> for some <sup>i</sup> <sup>∈</sup> [1; <sup>m</sup>] or Inf(π)∩F<sup>1</sup> <sup>j</sup> <sup>=</sup> <sup>∅</sup> for each <sup>j</sup> <sup>∈</sup> [1; <sup>m</sup>]. That is, whenever the play satisfies <sup>F</sup><sup>0</sup>, it also satisfies <sup>F</sup><sup>1</sup>. We use the tuples (H, F), (H, <sup>F</sup>) and (H, <sup>F</sup><sup>0</sup>, <sup>F</sup><sup>1</sup>) to denote a B¨uchi, generalized B¨uchi and GR(1) game over H, respectively, and collect all winning plays in these games in the sets <sup>L</sup>(H, F), <sup>L</sup>(H, <sup>F</sup>) and <sup>L</sup>(H, <sup>F</sup><sup>0</sup>, <sup>F</sup><sup>1</sup>). A strategy f<sup>l</sup> is *winning* for player l in a B¨uchi, generalized B¨uchi, or GR(1) game, if <sup>L</sup>(H, f<sup>l</sup> ) is contained in the respective set of winning plays.

**Set Transformers on Games.** Given a game graph H, we define the existential, universal, and player 0-, and player 1-controllable pre-operators. Let <sup>P</sup> <sup>⊆</sup> <sup>Q</sup>.

$$\operatorname{Pre}^{\exists}(P) = \left\{ q^0 \in Q^0 \middle| \delta^0(q^0) \cap P \neq \emptyset \right\} \cup \left\{ q^1 \in Q^1 \middle| \delta^1(q^1) \cap P \neq \emptyset \right\}, \text{ and} \tag{1}$$

$$\operatorname{Pre}^{\vee}(P) = \left\{ q^{0} \in Q^{0} \middle| \delta^{0}(q^{0}) \subseteq P \right\} \cup \left\{ q^{1} \in Q^{1} \middle| \delta^{1}(q^{1}) \subseteq P \right\},\tag{2}$$

$$\mathsf{Pre}^{0}(P) = \left\{ q^{0} \in Q^{0} \middle| \delta^{0}(q^{0}) \cap P \neq \emptyset \right\} \cup \left\{ q^{1} \in Q^{1} \middle| \delta^{1}(q^{1}) \subseteq P \right\}, \text{ and} \tag{3}$$

$$\operatorname{Pre}^1(P) = \left\{ q^0 \in Q^0 \middle| \delta^0(q^0) \subseteq P \right\} \cup \left\{ q^1 \in Q^1 \middle| \delta^1(q^1) \cap P \neq \emptyset \right\}.\tag{4}$$

Observe that <sup>Q</sup> \ Pre∃(P) = Pre∀(<sup>Q</sup> \ <sup>P</sup>) and <sup>Q</sup> \ Pre<sup>1</sup>(P) = Pre<sup>0</sup>(<sup>Q</sup> \ <sup>P</sup>).

We combine the operators in (1)–(4) to define a *conditional predecessor* CondPre and its dual CondPre for sets P, P <sup>⊆</sup> <sup>Q</sup> by

$$\mathsf{Cond}\mathsf{Pre}(P, P') := \mathsf{Pre}^{\exists}(P) \cap \mathsf{Pre}^{1}(P \cup P'), \text{ and} \tag{5}$$

$$\overline{\mathbb{C}\mathbf{ond}\mathbb{P}\mathbf{re}}(P,P') := \mathbb{P}\mathbf{re}^{\mathbb{V}}(P) \cup \mathbb{P}\mathbf{re}^{0}(P \cap P').\tag{6}$$

We see that <sup>Q</sup> \ CondPre(P, P ) = CondPre(<sup>Q</sup> \ P, Q \ <sup>P</sup> ).

*µ***-Calculus.** We use the μ-calculus as a convenient logical notation used to define a symbolic algorithm (i.e., an algorithm that manipulates sets of states rather then individual states) for computing a set of states with a particular property over a given game graph H. The formulas of the μ-calculus, interpreted over a two-player game graph H, are given by the grammar

$$\varphi ::= p \mid X \mid \varphi \cup \varphi \mid \varphi\_1 \cap \varphi\_2 \mid pre(\varphi) \mid \mu X.\varphi \mid \nu X.\varphi$$

where p ranges over subsets of Q, X ranges over a set of formal variables, *pre* ∈ {Pre∃, Pre∀, Pre<sup>0</sup>, Pre<sup>1</sup>, CondPre, CondPre} ranges over set transformers, and μ and ν denote, respectively, the least and greatest fixpoint of the functional defined as <sup>X</sup> <sup>→</sup> <sup>ϕ</sup>(X). Since the operations <sup>∪</sup>, <sup>∩</sup>, and the set transformers *pre* are all monotonic, the fixpoints are guaranteed to exist. A μ-calculus formula evaluates to a set of states over H, and the set can be computed by induction over the structure of the formula, where the fixpoints are evaluated by iteration. We omit the (standard) semantics of formulas [19].

#### **2.2 The Considered Synthesis Problem**

The GR(1) synthesis problem asks to synthesize a winning strategy for the system player (player 1) for a given GR(1) game (H, <sup>F</sup>A, <sup>F</sup>G) or determine that no such strategy exists. This can be equivalently represented in terms of ω-languages, by asking for a system strategy f <sup>1</sup> over H s.t.

$$
\emptyset \neq \mathcal{L}(H, f^1) \subseteq \overline{\mathcal{L}(H, \mathcal{F}\_{\mathcal{A}})} \cup \mathcal{L}(H, \mathcal{F}\_{\mathcal{Q}}) .
$$

That is, the system wins on plays <sup>π</sup> ∈ L(H, f <sup>1</sup>) if either π /∈ L(H, <sup>F</sup>A) or <sup>π</sup> ∈ L(H, <sup>F</sup>A)∩ L(H, <sup>F</sup>G). The only mechanism to ensure that *sufficiently* many computations will result from f <sup>1</sup> is the usage of the environment input, which enforces a minimal branching structure. However, the system could still win this game by *falsifying the assumptions*; i.e., by generating plays π /∈ L(H, <sup>F</sup>A) that prevent the environment from fulfilling its liveness properties.

We suggest an alternative view to the usage of the assumptions on the environment F<sup>A</sup> in a GR(1) game. The condition F<sup>A</sup> can be interpreted abstractly as modeling an underlying mechanism that ensures that the environment player (player 0) generates only inputs (possibly in response to observed outputs) that conform with the given assumption. In this context, we would like to ensure that the system (player 1) allows the environment, as much as possible, to fulfill its liveness and only *restricts* the environment behavior if needed to enforce the guarantees. We achieve this by forcing the system player to ensure that the environment is always able to play such that it fulfills its liveness, i.e.

$$\text{pfx}(\mathcal{L}(H, f^1)) = \text{pfx}(\mathcal{L}(H, f^1) \cap \mathcal{L}(H, \mathcal{F}\_{\mathcal{A}})) \dots$$

As the ⊇-inclusion trivially holds, the constraint is given by the ⊆-inclusion. Intuitively, the latter holds if every finite play α compliant with f <sup>1</sup> over H can be extended (by a suitable environment strategy) to an infinite play π compliant with f <sup>1</sup> that fulfills the environment liveness assumptions. It is easy to see that not every solution to the GR(1) game (H, <sup>F</sup>A, <sup>F</sup>G) (in the classical sense) supplies this additional requirement. We therefore propose to synthesize a system strategy f <sup>1</sup> with the above properties, as summarized in the following problem statement.

*Problem 1.* Given a GR(1) game (H, <sup>F</sup>A, <sup>F</sup>G) synthesize a system strategy <sup>f</sup> <sup>1</sup>

$$\text{s.t.}\quad \emptyset \neq \mathcal{L}(H, f^1) \subseteq \overline{\mathcal{L}(H, \mathcal{F}\_{\mathcal{A}})} \cup \mathcal{L}(H, \mathcal{F}\_{\mathcal{Q}}),\tag{7a}$$

$$\text{and} \quad \text{pfix}(\mathcal{L}(H, f^1)) = \text{pfix}(\mathcal{L}(H, f^1) \cap \mathcal{L}(H, \mathcal{F}\_{\mathcal{A}})) \tag{7b}$$

both hold, or verify that no such system strategy exists. -

Problem 1 asks for a strategy f <sup>1</sup> s.t. every play π compliant with f <sup>1</sup> over <sup>H</sup> fulfills the system guarantees, i.e., <sup>π</sup> ∈ L(H, <sup>F</sup>G), if the environment fulfills its liveness properties, i.e., if <sup>π</sup> ∈ L(H, <sup>F</sup>A) (from (7a)), while the latter always remains possible (by a suitably playing environment) due to (7b). Inspired by algorithms solving the supervisory controller synthesis problem for non-terminating processes [23,27], we propose a solution to Problem 1 in terms of a vectorized 4-nested fixed-point in the remaining part of this paper. We show that Problem 1 can be solved by a finite-memory strategy, if a solution exists.

We note that (7b) is not a linear time but a branching time property and can therefore not be "compiled away" into a different GR(1) or even ω-regular objective. Satisfaction of (7b) requires checking whether the set <sup>F</sup><sup>A</sup> remains reachable from any reachable state in the game graph realizing <sup>L</sup>(H, f <sup>1</sup>)<sup>1</sup>.

#### **3 Algorithmic Solution for Singleton Winning Conditions**

We first consider the GR(1) game (H, <sup>F</sup>A, <sup>F</sup>G) with singleton winning conditions <sup>F</sup><sup>A</sup> <sup>=</sup> {FA} and <sup>F</sup><sup>G</sup> <sup>=</sup> {FG}, i.e., <sup>n</sup> <sup>=</sup> <sup>m</sup> = 1. It is well known that a system winning strategy f <sup>1</sup> for this game can be synthesized by solving a three color parity game over H. This can be expressed by the μ-calculus formula (see [15])

$$\varphi\_3 := \nu Z \, . \,\mu Y \, . \,\nu X \, . \,(F\_{\mathcal{Q}} \cap \operatorname{Pre}^1(Z)) \cup \operatorname{Pre}^1(Y) \cup (Q \, \backslash F\_{\mathcal{A}} \cap \operatorname{Pre}^1(X)) . \,\,(8)$$

<sup>1</sup> It can indeed be expressed by the CTL<sup>∗</sup> formula AGEF*F*<sup>A</sup> (see [13], Sect. 3.3.2).

It follows that <sup>q</sup><sup>0</sup> <sup>∈</sup> [[ϕ3]] if and only if the synthesis problem has a solution and the winning strategy f <sup>1</sup> is obtained from a ranking argument over the sets computed during the evaluation of (8).

To obtain a system strategy f <sup>1</sup> solving Problem 1 instead, we propose to extend (8) to a 4-nested fixed-point expressed by the μ-calculus formula

$$\begin{array}{c} \varphi\_{4} = \nu Z \ . \ \mu Y \ . \ \nu X \ . \ \mu W \ . \\ \qquad (F\_{\mathcal{G}} \cap \mathsf{Pre}^{1}(Z)) \ \mathsf{U} \ \mathsf{Pre}^{1}(Y) \ \mathsf{U} \ \mathsf{(}(Q \nmid F\_{\mathcal{A}}) \cap \mathsf{Cond} \mathsf{Pre}(W, X \ \langle \, F\_{\mathcal{A}} \rangle) \ . \end{array} \tag{9}$$

Compared to (8) this adds an inner-most largest fixed-point and substitutes the last controllable pre-operator by the conditional one. Intuitively, this distinguishes between states from which player 1 can force visiting <sup>F</sup><sup>G</sup> and states from which player 1 can force avoiding <sup>F</sup>A. This is in contrast to (8) and allows to exclude strategies that allow player 1 to win by falsifying the assumptions.

The remainder of this section shows that <sup>q</sup><sup>0</sup> <sup>∈</sup> [[ϕ4]] if and only if Problem <sup>1</sup> has a solution and the winning strategy f <sup>1</sup> fulfilling (7a) and (7b) can be obtained from a ranking argument over the sets computed during the evaluation of (9).

#### **Soundness**

We prove soundness of (9) by showing that every state <sup>q</sup> <sup>∈</sup> [[ϕ4]] is winning for the system player. In view of Problem 1 this requires to show that there exists a system strategy <sup>f</sup> <sup>1</sup> s.t. all plays starting in a state <sup>q</sup> <sup>∈</sup> [[ϕ4]] and evolving in accordance to f <sup>1</sup> result in an infinite play that fulfills (7a) and (7b).

We start by defining f <sup>1</sup> from a ranking argument over the iterations of (9). Consider the last iteration of the fixed-point in (9) over Z. As (9) terminates after this iteration we have Z = Z<sup>∞</sup> = [[ϕ4]]. Assume that the fixed point over Y is reached after k iterations. If Y <sup>i</sup> is the set obtained after the i-th iteration, we have that <sup>Z</sup><sup>∞</sup> <sup>=</sup> <sup>k</sup> <sup>i</sup>=0 <sup>Y</sup> <sup>i</sup> with <sup>Y</sup> <sup>i</sup> <sup>⊆</sup> <sup>Y</sup> <sup>i</sup>+1, <sup>Y</sup> <sup>0</sup> <sup>=</sup> <sup>∅</sup> and <sup>Y</sup> <sup>k</sup> <sup>=</sup> <sup>Z</sup>∞. Furthermore, let X<sup>i</sup> = Y <sup>i</sup> denote the fixed-point of the iteration over X resulting in Y <sup>i</sup> and denote by W<sup>i</sup> <sup>j</sup> the set obtained in the jth iteration over W performed while using the value <sup>X</sup><sup>i</sup> for <sup>X</sup> and <sup>Y</sup> <sup>i</sup>−<sup>1</sup> for <sup>Y</sup> . Then it holds that <sup>Y</sup> <sup>i</sup> <sup>=</sup> <sup>X</sup><sup>i</sup> <sup>=</sup> <sup>l</sup>*<sup>i</sup>* <sup>j</sup>=0 <sup>W</sup><sup>i</sup> j with W<sup>i</sup> <sup>j</sup> <sup>⊆</sup> <sup>W</sup><sup>i</sup> <sup>j</sup>+1, <sup>W</sup><sup>i</sup> <sup>0</sup> <sup>=</sup> <sup>∅</sup> and <sup>W</sup><sup>i</sup> <sup>l</sup>*<sup>i</sup>* <sup>=</sup> <sup>Y</sup> <sup>i</sup> for all <sup>i</sup> <sup>∈</sup> [0; <sup>k</sup>].

Using these sets, we define a ranking for every state <sup>q</sup> <sup>∈</sup> <sup>Z</sup><sup>∞</sup> s.t.

$$\mathsf{rank}(q) = (i, j) \text{ iff } q \in \left( Y^i \mid Y^{i-1} \right) \cap \left( W\_j^i \mid W\_{j-1}^i \right) \text{ for } i, j > 0. \tag{10}$$

We order ranks lexicographically. It further holds that (see [21])

$$q \in D \quad \Leftrightarrow \ \mathsf{rank}(q) = (1, 1) \qquad \qquad \Leftrightarrow \ q \in F\_{\mathcal{G}} \cap Z^{\infty} \tag{11a}$$

$$q \in E^i \quad \Leftrightarrow \ \mathsf{rank}(q) = (i, 1) \land i > 1 \quad \Leftrightarrow \ q \in (F\_{\mathcal{A}} \backslash F\_{\mathcal{G}}) \cap Z^{\infty} \tag{11b}$$

$$q \in R\_j^i \quad \Leftrightarrow \ \mathsf{rank}(q) = (i, j) \land j > 1 \quad \Leftrightarrow \ q \in (Z^{\infty} \ \langle \, (F\_{\mathcal{A}} \cup F\_{\mathcal{G}}) \rangle, \quad \text{(11c)}$$

where D, E<sup>i</sup> and R<sup>i</sup> <sup>j</sup> denote the sets *added* to the winning state set by the first, second and third term of (9), respectively, in the corresponding iteration.

Figure 3 (left) shows a schematic representation of this construction for an example with <sup>k</sup> = 3, <sup>l</sup><sup>1</sup> = 4, <sup>l</sup><sup>2</sup> = 2 and <sup>l</sup><sup>3</sup> = 3. The set <sup>D</sup> <sup>=</sup> <sup>F</sup><sup>G</sup> <sup>∩</sup> <sup>Z</sup><sup>∞</sup> is

**Fig. 3.** Schematic representation of the ranking defined in (10) (left) and in (16) (right). Diamond, ellipses and rectangles represent the sets *D*, *E<sup>i</sup>* and *R<sup>i</sup> <sup>j</sup>* , while blue, green and red indicate the sets *<sup>Y</sup>* <sup>1</sup>, *<sup>Y</sup>* <sup>2</sup> \ *<sup>Y</sup>* <sup>1</sup> and *<sup>Y</sup>* <sup>3</sup> \ *<sup>Y</sup>* <sup>2</sup> (annotated by *<sup>a</sup>* /*ab* for the right figure). Labels (*i, j*) and (*a, i, b, j*) indicate that all states *q* associated with this set fulfill rank(*q*)=(*i, j*) and *ab*rank(*q*)=(*i, j*), respectively. Solid, colored arcs indicate systemenforceable moves, dotted arcs indicate existence of environment or system transitions and dashed arcs indicate possible existence of environment transitions. (Color figure online)

represented by the diamond at the top where the label (1, 1) denotes the associated rank (see (11a)). The ellipses represent the sets <sup>E</sup><sup>i</sup> <sup>⊆</sup> (F<sup>A</sup> \ <sup>F</sup>G) <sup>∩</sup> <sup>Z</sup>∞, where the corresponding i > 1 is indicated by the associated rank (i, 1). Due to the use of the controllable pre-operator in the first and second term of (9), it is ensured that progress out of D and E<sup>i</sup> can be enforced by the system, indicated by the solid arrows. This is in contrast to all states in R<sup>i</sup> <sup>j</sup> <sup>⊆</sup> <sup>Z</sup><sup>∞</sup> \F<sup>A</sup> \FG, which are represented by the rectangular shapes in Fig. 3 (left). These states allow the environment to increase the ranking (dashed lines) as long as <sup>Z</sup><sup>∞</sup> \F<sup>A</sup> \F<sup>G</sup> is not left and there exists a possible move to decrease the j-rank (dotted lines). While this does not strictly enforce progress, we see that whenever the environment plays such that states in <sup>F</sup><sup>A</sup> (i.e., the ellipses) are visited infinitely often (i.e., the environment fulfills its assumptions), the system can enforce progress w.r.t. the defined ranking and states in <sup>F</sup><sup>G</sup> (i.e., the diamond shape) is eventually visited. The system is restricted to take the existing solid or dotted transitions in Fig. 3 (left). With this, it is easy to see that the constructed strategy is winning if the environment fulfills its assumptions, i.e., (7a) holds. However, to ensure that (7b) also holds, we need an additional requirement. This is necessary as the used construction also allows plays to cycle through the blue region of Fig. 3 (left) only, and by this not surely visiting states in <sup>F</sup><sup>A</sup> infinitely often. However, if <sup>L</sup>(H, FG) ⊆ L(H, FA) we see that (7b) holds as well. It should be noted that the latter is a sufficient condition which can be easily checked symbolically on the problem instance but not a necessary one.

Based on the ranking in (10) we define a memory-less system strategy f <sup>1</sup> : <sup>Q</sup><sup>1</sup> <sup>∩</sup> <sup>Z</sup><sup>∞</sup> <sup>→</sup> <sup>Q</sup><sup>0</sup> <sup>⊆</sup> <sup>δ</sup><sup>1</sup> s.t. the rank is always decreased, i.e.,

$$q'=f^1(q)\Rightarrow \begin{cases} \mathsf{rank}(q') < \mathsf{rank}(q), & \mathsf{rank}(q) > (1,1) \\ q'\in Z^{\infty}, & \text{otherwise} \end{cases}.\tag{12}$$

The next theorem shows that this strategy indeed solves Problem 1.

**Theorem 1.** *Let* (H, <sup>F</sup>A, <sup>F</sup>G) *be a GR(1) game with singleton winning conditions* <sup>F</sup><sup>A</sup> <sup>=</sup> {FA} *and* <sup>F</sup><sup>G</sup> <sup>=</sup> {FG}*. Suppose* <sup>f</sup> <sup>1</sup> *is the system strategy in* (12) *based on the ranking in* (10)*. Then it holds for all* <sup>q</sup> <sup>∈</sup> [[ϕ4]] *that*<sup>2</sup>

$$\mathcal{L}\_q(H, f^1) \subseteq \overline{\mathcal{L}\_q(H, \mathcal{F}\_{\mathcal{A}})} \cup \mathcal{L}\_q(H, \mathcal{F}\_{\mathcal{G}}),\tag{13a}$$

$$\mathcal{L}\_q(H, f^1) \cap \mathcal{L}\_q(H, \mathcal{F}\_{\mathcal{G}}) \neq \emptyset, \text{ and} \tag{13b}$$

$$\mathcal{L}\_q(H, \mathcal{F}\_{\mathcal{G}}) \subseteq \mathcal{L}\_q(H, \mathcal{F}\_{\mathcal{A}}) \Rightarrow \text{pfix}(\mathcal{L}\_q(H, f^1)) = \text{pfix}(\mathcal{L}\_q(H, f^1) \cap \mathcal{L}\_q(H, \mathcal{F}\_{\mathcal{A}})).\tag{13c}$$

#### **Completeness**

We show completeness of (9) by establishing that every state <sup>q</sup> <sup>∈</sup> <sup>Q</sup>\[[ϕ4]] = [[ϕ4]] is losing for the system player. In view of Problem 1 this requires to show that for all <sup>q</sup> <sup>∈</sup> [[ϕ4]] and all system strategies <sup>f</sup> <sup>1</sup> either (7a) or (7b) does not hold. This is formalized in [21] by first negating the fixed-point in (9) and deriving the induced ranking of this negated fixed-point. Using this ranking, we first show that the environment can (i) render the negated winning set Z<sup>∞</sup> invariant and (ii) can always enforce the play to visit <sup>F</sup><sup>G</sup> only finitely often, resulting in a violation of the guarantees. Using these observations we finally show that whenever (7a) holds for an arbitrary system strategy f <sup>1</sup> starting in [[ϕ4]], then (7b) cannot hold. With this, completeness, as formalized in the following theorem, directly follows.

**Theorem 2.** *Let* (H, <sup>F</sup>A, <sup>F</sup>G) *be a GR(1) game with singleton winning conditions* <sup>F</sup><sup>A</sup> <sup>=</sup> {FA} *and* <sup>F</sup><sup>G</sup> <sup>=</sup> {FG}*. Then it holds for all* <sup>q</sup> <sup>∈</sup> [[ϕ4]] *and all system strategies* f <sup>1</sup> *over* H *that either*

$$\emptyset \ne \mathcal{L}\_q(H, f^1) \subseteq \overline{\mathcal{L}\_q(H, \mathcal{F}\_{\mathcal{A}})} \cup \mathcal{L}\_q(H, \mathcal{F}\_{\mathcal{G}}), \text{ or } \tag{14a}$$

$$\text{pfix}(\mathcal{L}\_q(H, f^1)) = \text{pfix}(\mathcal{L}\_q(H, f^1) \cap \mathcal{L}\_q(H, \mathcal{F}\_\mathcal{A})) \text{ does not hold.} \tag{14b}$$

<sup>2</sup> Given a state *<sup>q</sup>* <sup>∈</sup> *<sup>Q</sup>* <sup>=</sup> *<sup>Q</sup>*<sup>0</sup> <sup>∪</sup> *<sup>Q</sup>*<sup>1</sup> we use the subscript *<sup>q</sup>* to denote that the respective set of plays is defined by using *q* as the initial state of *H*.

#### **A Solution for Problem** 1

We note that the additional assumption in Theorem 1 is required only to ensure that the resulting strategy fulfills (7b). Suppose that this assumption holds for the initial state <sup>q</sup><sup>0</sup> of <sup>H</sup>. That is, consider a GR(1) game (H, <sup>F</sup>A, <sup>F</sup>G) with singleton winning conditions <sup>F</sup><sup>A</sup> <sup>=</sup> {FA} and <sup>F</sup><sup>G</sup> <sup>=</sup> {FG} s.t. <sup>L</sup>(H, FG) ⊆ L(H, FA). Then it follows from Theorem <sup>2</sup> that Problem <sup>1</sup> has a solution iff <sup>q</sup><sup>0</sup> <sup>∈</sup> [[ϕ4]]. Furthermore, if <sup>q</sup><sup>0</sup> <sup>∈</sup> [[ϕ4]], based on the intermediate values maintained for the computation of ϕ<sup>4</sup> in (10) and the ranking defined in (12), we can construct f <sup>1</sup> that wins the GR(1) condition in (7a) and is non-conflicting, as in (7b).

We can check symbolically whether <sup>L</sup>(H, FG) ⊆ L(H, FA). For this we construct a game graph <sup>H</sup> from <sup>H</sup> by removing all states in <sup>F</sup>A, and then check whether <sup>L</sup>(H , FG) is empty. The latter is decidable in logarithmic space and polynomial time. If this check fails, then <sup>L</sup>(H, FG) ⊆ L(H, FA). Furthermore, we can replace <sup>L</sup>(H, <sup>F</sup>G) in (7a) by <sup>L</sup>(H, <sup>F</sup>G) ∩ L(H, <sup>F</sup>A) without affecting the restriction (7a) imposes on the choice of f <sup>1</sup>. Given singleton winning conditions <sup>F</sup><sup>G</sup> and <sup>F</sup>A, we see that <sup>L</sup>(H, FG) ∩ L(H, FA) = <sup>L</sup>(H, {FG, FA}) and it trivially holds that <sup>L</sup>(H, {FG, FA}) ⊆ L(H, FA). That is, we fulfill the conditional by replacing the system guarantee <sup>L</sup>(H, <sup>F</sup>G) by <sup>L</sup>(H, {FG, FA}). However, this results in a GR(1) synthesis problem with m = 1 and n = 2, which we discuss next.

#### **4 Algorithmic Solution for GR(1) Winning Conditions**

We now consider a general GR(1) game (H, <sup>F</sup>A, <sup>F</sup>G) with <sup>F</sup><sup>A</sup> <sup>=</sup> {<sup>1</sup> <sup>F</sup>A, . . ., <sup>m</sup>FA} and <sup>F</sup><sup>G</sup> <sup>=</sup> {<sup>1</sup> <sup>F</sup>G, . . ., <sup>n</sup>FG} s.t. n, m > 1. The known fixed-point for solving GR(1) games in [6] rewrites the three nested fixed-point in (8) in a vectorized version, which induces an order on the guarantee sets in F<sup>G</sup> and adds a disjunction over all assumption sets in F<sup>A</sup> to every line of this vectorized fixed-point. Adapting the same idea to the 4-nested fixed-point algorithm (9) results in

$$\varphi\_4 = \nu \begin{bmatrix} ^1\!Z\\ ^2\!Z\\ \vdots\\ ^n\!Z \end{bmatrix} \cdot \begin{bmatrix} \mu \, ^1Y \cdot \left( \bigvee\_{b=1}^m \nu \, ^1Y \cdot \mu \, ^1W \, ^1b \mathcal{Q} \right) \\\ \mu \, ^2Y \cdot \left( \bigvee\_{b=1}^m \nu \, ^2X \cdot \mu \, ^2W \, ^2b \mathcal{Q} \right) \\\ \vdots\\ \ \mu \, ^ny \, \cdot \; \left( \bigvee\_{b=1}^m \nu \, ^nbX \cdot \mu \, ^nbW \, ^nb \mathcal{Q} \right) \end{bmatrix},\tag{15}$$

where, abΩ = (<sup>a</sup> <sup>F</sup><sup>G</sup> <sup>∩</sup> Pre<sup>1</sup>(<sup>a</sup><sup>+</sup> <sup>Z</sup>)) <sup>∪</sup> Pre<sup>1</sup>(<sup>a</sup> <sup>Y</sup> ) <sup>∪</sup> (<sup>Q</sup> \ <sup>b</sup> <sup>F</sup><sup>A</sup> <sup>∩</sup> CondPre(W, X \ <sup>b</sup> <sup>F</sup>A)) and a<sup>+</sup> denotes (a mod n) + 1.

The remainder of this section shows how soundness and completeness carries over from the 4-nested fixed-point algorithm (9) to its vectorized version in (15).

#### **Soundness and Completeness**

We refer to intermediate sets obtained during the computation of the fixpoints by similar notations as in Sect. 3. For example, the set <sup>a</sup> Y <sup>i</sup> is the i-th approximation of the fixpoint computing <sup>a</sup> Y and abW<sup>i</sup> <sup>j</sup> is the j-th approximation of abW while computing the i-th approximation of <sup>a</sup> Y , i.e., computing <sup>a</sup> Y <sup>i</sup> and using <sup>a</sup> Y <sup>i</sup>−<sup>1</sup>. Similar to the above, we define a mode-based rank for every state <sup>q</sup> <sup>∈</sup> <sup>a</sup> Z∞; we track the currently chased guarantee <sup>a</sup> <sup>∈</sup> [1; <sup>n</sup>] (similar to [6]) and the currently avoided assumption set <sup>b</sup> <sup>∈</sup> [1, m] as an additional internal mode. In analogy to (10) we define

$$\mathsf{T}^{ab}\mathsf{rank}(q) = (i,j)\text{ iff }q \in \left(\mathsf{T}^{i}\backslash\,^{a}Y^{i-1}\right) \cap \left(\,^{ab}W^{i}\_{j}\backslash\,^{ab}W^{i}\_{j-1}\right)\text{ for }i,j > 0. \tag{16}$$

Again, we order ranks lexicographically, and, in analogy to (11a), (11b) and (11c), we have

$$q \in \, ^aD \quad \Leftrightarrow \, ^a \mathsf{rash}(q) = (1, 1) \qquad \qquad \Rightarrow q \in \, ^aF\_{\mathcal{G}},\tag{17a}$$

$$q \in \, ^aE^i \quad \Leftrightarrow \, ^a \mathsf{rænk}(q) = (i, 1) \land i > 1,\tag{17b}$$

$$q \in \, ^{ab}R\_j^i \quad \Leftrightarrow \, ^{ab}\mathsf{rank}(q) = (i,j) \wedge j > 1 \qquad \Rightarrow \, q \notin \, ^bF\_{\mathcal{A}}.\tag{17c}$$

The sets <sup>a</sup> Y <sup>i</sup> , abW<sup>i</sup> <sup>j</sup> , <sup>a</sup> D, <sup>a</sup> E<sup>i</sup> and abR<sup>i</sup> <sup>j</sup> are interpreted in direct analogy to Sect. 3, where a and b annotate the used line and conjunct in (15).

Figure 3 (right) shows a schematic representation of the ranking for an example with <sup>a</sup> k = 3, <sup>a</sup><sup>1</sup> l<sup>1</sup> = 0, <sup>a</sup><sup>2</sup> l<sup>1</sup> = 4, <sup>a</sup><sup>3</sup> l<sup>1</sup> = 2, <sup>a</sup>· l<sup>2</sup> = 2, <sup>a</sup><sup>1</sup> l<sup>3</sup> = 3, <sup>a</sup><sup>2</sup> <sup>l</sup><sup>3</sup> = 0, and <sup>a</sup><sup>3</sup> l<sup>3</sup> = 2. Again, the set <sup>a</sup> <sup>D</sup> <sup>⊆</sup> <sup>a</sup> <sup>F</sup><sup>G</sup> is represented by the diamond at the top of the figure. Similarly, all ellipses represent sets <sup>a</sup> E<sup>i</sup> added in the i-th iteration over line a of (15). Again, progress out of ellipses can be enforced by the system, indicated by the solid arrows leaving those shapes. However, this might not preserve the current b mode. It might be the environment choosing which assumption to avoid next. Further, the environment might choose to change the b mode along with decreasing the i-rank, as indicated by the colored dashed lines<sup>3</sup>. Finally, the interpretation of the sets represented by rectangular shapes in Fig. 3 (right), corresponding to (17c), is in direct analogy to the case with singleton winning conditions. It should be noticed that this is the only place where we preserve the current b-mode when constructing a strategy.

Using this intuition we define a system strategy that uses enforceable and existing transitions to decrease the rank if possible and preserves the current a mode until the diamond shape is reached. The b mode is only preserved within rectangular sets. This is formalized by a strategy

$$f^1: \bigcup\_{a \in [1:n]} \left( (Q^1 \cap {}^a \mathbf{Z}^\infty) \times a \times [1; m] \right) \to Q^0 \times [1; n] \times [1; m] \tag{18a}$$

s.t. (q , ·, ·) = <sup>f</sup> <sup>1</sup>(q, ·, ·) implies <sup>q</sup> <sup>∈</sup> <sup>δ</sup><sup>1</sup>(q) and (q , a , b ) = f <sup>1</sup>(q, a, b) implies

$$\begin{cases} q' \in \, ^{a^+}Z^{\infty} \wedge a' = a^+, & \text{ $a^{ab}$ rank}(q) = (1,1) \\ \, ^{a'b'}\mathsf{rank}(q') \le (i-1, \cdot) \wedge a' = a, & \text{ $^{ab}$ rank}(q) = (i,1), i > 1 \\ \, ^{a'b'}\mathsf{rank}(q') \le (i, j-1) \wedge a' = a \wedge b' = b, & \text{ $^{ab}$ rank}(q) = (i,j), j > 1 \end{cases} \tag{18b}$$

<sup>3</sup> The strategy extraction in (18a) and (18b) prevents the system from choosing a different *b* mode. The strategy choice could be optimized w.r.t. fast progress towards *a F*<sup>G</sup> in such cases.

We say that a play <sup>π</sup> over <sup>H</sup> is compliant with <sup>f</sup> <sup>1</sup> if there exist mode traces <sup>α</sup> <sup>∈</sup> [1; n] <sup>ω</sup> and <sup>β</sup> <sup>∈</sup> [1; <sup>m</sup>] <sup>ω</sup> s.t. for all <sup>k</sup> <sup>∈</sup> <sup>N</sup> holds (π(2<sup>k</sup> + 2), α(2<sup>k</sup> + 2), β(2<sup>k</sup> + 2)) = f <sup>1</sup>(π(2k + 1), α(2k + 1), β(2k + 1)), and (i) α(2k + 1) = α(2k)<sup>+</sup> if abrank(π(2k + 1)) = (1, 1), (ii) α(2k + 1) = α(2k) if abrank(π(2k + 1)) = (i, 1),i > 1, and (iii) α(2k + 1) = α(2k) and β(2k + 1) = β(2k) if abrank(π(2k + 1)) = (i, j),j > 1.

With this it is easy to see that the intuition behind Theorem 1 directly carries over to every line of (15). Additionally, using Pre<sup>1</sup>(a<sup>+</sup> Z) in <sup>a</sup> D allows to cycle through all the lines of (15), which ensures that every set <sup>a</sup> <sup>F</sup><sup>G</sup> ∈ F<sup>G</sup> is tried to be attained by the constructed system strategy in a pre-defined order. See [21] for a formalization of this intuition and a detailed proof.

To prove completeness, it is also shown in [21] that the negation of (15) can be over-approximated by negating every line separately. Therefore, the reasoning for every line of the negated fixed-point carries over from Sect. 3, resulting in the analogous completeness result. With this we obtain soundness and completeness in direct analogy to Theorems 1–2, formalized in Theorem 3.

**Theorem 3.** *Let* (H, <sup>F</sup>A, <sup>F</sup>G) *be a GR(1) game with* <sup>F</sup><sup>A</sup> <sup>=</sup> {<sup>1</sup> <sup>F</sup>A, . . ., <sup>m</sup>FA} *and* <sup>F</sup><sup>G</sup> <sup>=</sup> {<sup>1</sup> <sup>F</sup>G, . . ., <sup>n</sup>FG}*. Suppose* <sup>f</sup> <sup>1</sup> *is the system strategy in* (18a) *and* (18b) *based on the ranking in* (16)*. Then it holds for all* <sup>q</sup> <sup>∈</sup> [[ϕ<sup>v</sup> <sup>4</sup>]] *that* (13a)*,* (13b) *and* (13c) *hold. Furthermore, it holds for all* q /<sup>∈</sup> [[ϕ<sup>v</sup> <sup>4</sup>]] *and all system strategies* f <sup>1</sup> *over* H *that either* (14a) *or* (14b) *does not hold.*

#### **A Solution for Problem** 1

Given that <sup>L</sup>(H, <sup>F</sup>G) ⊆ L(H, <sup>F</sup>A) it follows from Theorem <sup>3</sup> that Problem <sup>1</sup> has a solution iff <sup>q</sup><sup>0</sup> <sup>∈</sup> [[ϕ<sup>v</sup> <sup>4</sup>]]. Furthermore, if <sup>q</sup><sup>0</sup> <sup>∈</sup> [[ϕ<sup>v</sup> <sup>4</sup>]] we can construct f <sup>1</sup> that wins the GR(1) condition in (7a) and is non-conflicting, as in (7b).

Using a similar construction as in Sect. 3, we can symbolically check whether <sup>L</sup>(H, <sup>F</sup>G) ⊆ L(H, <sup>F</sup>A). For this, we construct a new game graph <sup>H</sup><sup>b</sup> for every b <sup>F</sup>A, <sup>b</sup> <sup>∈</sup> [1; <sup>m</sup>] by removing the latter set from the state set of <sup>H</sup> and checking whether <sup>L</sup>(Hb, <sup>F</sup>G) is empty. If some of these <sup>m</sup> checks fail, we have <sup>L</sup>(H, <sup>F</sup>G) <sup>⊆</sup> <sup>L</sup>(H, <sup>F</sup>A). Now observe that by checking every <sup>b</sup> <sup>F</sup><sup>A</sup> separately, we know which goals are not necessarily passed by infinite runs which visit all <sup>a</sup> <sup>F</sup><sup>G</sup> infinitely often and can collect them in the set <sup>F</sup>failed <sup>A</sup> . Using the same reasoning as in Sect. 3, we can simply add the set <sup>F</sup>failed <sup>A</sup> to the system guarantee set to obtain an equivalent synthesis problem which is solvable by the given algorithm, if it is realizable. More precisely, consider the new system guarantee set F <sup>G</sup> <sup>=</sup> <sup>F</sup><sup>G</sup> ∪ Ffailed <sup>A</sup> and observe that <sup>L</sup>(H, <sup>F</sup> <sup>G</sup>) ⊆ L(H, <sup>F</sup>A) by definition, and therefore substituting <sup>L</sup>(H, <sup>F</sup>G) by <sup>L</sup>(H, <sup>F</sup> <sup>G</sup>) in (7a) does not change the satisfaction of the given inclusion.

### **5 Complexity Analysis**

We show that the search for a more elaborate strategy does not affect the worst case complexity. In Sect. 6 we show that this is also the case in practice. We state this complexity formally below.

**Theorem 4.** *Let* (H, <sup>F</sup>A, <sup>F</sup>G) *be a GR(1) game. We can check whether there is a winning non-conflicting strategy* f <sup>1</sup> *by a symbolic algorithm that performs* <sup>O</sup>(|Q<sup>|</sup> <sup>2</sup>|FG||FA|) *next step computations and by an enumerative algorithm that works in time* <sup>O</sup>(m|Q<sup>|</sup> <sup>2</sup>|FG||FA|)*, where* <sup>m</sup> *is the number of transitions of the game.*

*Proof.* Each line of the fixed-point is iterated <sup>O</sup>(|Q<sup>|</sup> <sup>2</sup>) times [8]. As there are |FG||FA| lines the upper bound follows. As we have to compute |FG||FA| different ranks for each state, it follows that the complexity is <sup>O</sup>(m|Q<sup>|</sup> <sup>2</sup>|FG||FA|).

We note that *enumeratively* our approach is theoretically worse than the classical approach to GR(1). This follows from the straight forward reduction to the rank computation in the rank lifting algorithm and the relative complexity of the new rank when compared to the general GR(1) rank. We conjecture that more complex approaches, e.g., through a reduction to a parity game and the usage of other enumerative algorithms, could eliminate this gap.

### **6 Experiments**

We have implemented the 4-nested fixed-point algorithm in (15) and the corresponding strategy extraction in (18a) and (18b). It is available as an extension to the GR(1) synthesis tool slugs [14]. In this section we show how this algorithm (called 4FP) performs in comparison to the usual 3-nested fixed-point algorithm for GR(1) synthesis (called 3FP) available in slugs. All experiments were run on a computer with an Intel i5 processor running an x86 Linux at 2 GHz with 8 GB of memory.

We first run both algorithms on a benchmark set obtained from the maze example in the introduction by changing the number of rows and columns of the maze. We first increased the number of lines in the maze and added a goal state for both the obstacle and the robot per line. This results in a maze where in the first and last column, system and environment goals alternate and all adjacent cells are separated by a horizontal wall. Hence, both players need to cross the one-cell wide white space in the middle infinitely often to visit all their goal states infinitely often. The computation times and the number of states in the resulting strategy are shown in Table 1, upper part, column 3– 6. Interestingly, we see that the 3FP always returns a strategy that blocks the environment. In contrast, the non-conflicting strategies computed by the 4FP are relatively larger (in state size) and computed about 10 times slower compared to the 3FP (compare column 3–4 and 5–6). When increasing the number of columns instead (lower part of Table 1), the number of goals is unaffected. We made the maze wider and left only a one-cell wide passage in the middle of the maze to allow crossings between its upper and lower row. Still, the 3FP only returns strategies that falsify the assumption, which have fewer states and are computed much faster than the environment respecting strategy returned by the 4FP. Unfortunately, the speed of computing a strategy or its size is immaterial if the winning strategy so computed wins only by falsifying assumptions.

To rule out the discrepancy between the two algorithms w.r.t. the size of strategies, we slightly modified the above maze benchmark s.t. the environment assumptions are not falsifiable anymore. We increased the capabilities of the obstacle by allowing it to move at most 2 steps in each round and to "jump over" the robot. Under these assumptions we repeated the above experiments. The computation times and the number of states in the resulting strategy are shown in Table 1, column 9–12. We see, that in this case the size of the strategies computed by the two algorithms are more similar. The larger number for the 4FP is due to the fact that we have to track both the a and the b mode, possibly resulting in multiple copies of the same a-mode state. We see that the state difference decreases with the number of goals (upper part of Table 1, column 9–12) and increases with the number of (non-goal) states (lower part of Table 1, column 9–12). In both cases, the 3FP still computes faster, but the difference decreases with the number of goals.

In addition to the 3FP and the 4FP we have also tested a sound but incomplete heuristic, which avoids the disjunction over all b's in every line of (15) by only investigating a = b. The state count and computation times for this heuristic are shown in Table 1, column 7–8 for the original maze benchmark, and in column 13–14 for the modified one. We see that in both cases the heuristic only returns a winning strategy if the maze is not wider then 3 cells. This is due to the fact that in all other cases the robot cannot prevent the obstacle from attaining a particular assumption state until the robot has moved from one goal to the next. The 4FP handles this problem by changing between avoided assumptions in between visits to different goals. Intuitively, the computation times and state counts for the heuristic should be smaller then for the 4FP, as the exploration of the disjunction over b's is avoided, which is true for many scenarios of the considered benchmark. It should however be noted that this is not always the case (compare e.g. line 3, column 6 and 8). This stems from the fact that restricting the synthesis to avoiding one particular assumption might require more iterations over W and Y within the fixed-point computation.

**Table 1.** Experimental results for the maze benchmark. The size of the maze is given in columns/lines, the number of goals is given per player. The states are counted for the returned winning strategies. Strategies preventing the environment from fulfilling its goals are indicated by a <sup>∗</sup>. Recorded computation times are rounded wall-clock times.


#### **7 Discussion**

We believe the requirement that a winning strategy be *non-conflicting* is a simple way to disallow strategies that win by actively preventing the environment from satisfying its assumptions, without significantly changing the theoretical formulation of reactive synthesis (e.g., by adding different winning conditions or new notions of equilibria). It is not a trace property, but our main results show that adding this requirement retains the algorithmic niceties of GR(1) synthesis: in particular, symbolic algorithms have the same asymptotic complexity.

However, non-conflictingness makes the implicit assumption of a "maximally flexible" environment: it is possible that because of unmodeled aspects of the environment strategy, it is not possible for the environment to satisfy its specifications in the precise way allowed by a non-conflicting strategy. In the maze example discussed in Sect. 1, the environment needs to move the obstacle to precisely the goal cell which is currently rendered reachable by the system. If the underlying dynamics of the obstacle require it to go back to the lower left from state q<sup>3</sup> before proceeding to the upper right (e.g., due to a required battery recharge), the synthesized robot strategy prevents the obstacle from doing so.

Finally, if there is no non-conflicting winning strategy, one could look for a "minimally violating" strategy. We leave this for future work. Additionally, we leave for future work the consideration of non-conflictingness for general LTL specifications or (efficient) fragments thereof.

#### **References**


246 R. Majumdar et al.

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **StocHy: Automated Verification and Synthesis of Stochastic Processes**

Nathalie Cauchi(B) and Alessandro Abate

Department of Computer Science, University of Oxford, Oxford, UK nathalie.cauchi@cs.ox.ac.uk

**Abstract.** StocHy is a software tool for the quantitative analysis of discrete-time *stochastic hybrid systems* (shs). StocHy accepts a high-level description of stochastic models and constructs an equivalent shs model. The tool allows to (i) simulate the shs evolution over a given time horizon; and to automatically construct formal abstractions of the shs. Abstractions are then employed for (ii) formal verification or (iii) control (policy, strategy) synthesis. StocHy allows for modular modelling, and has separate simulation, verification and synthesis engines, which are implemented as independent libraries. This allows for libraries to be easily used and for extensions to be easily built. The tool is implemented in c++ and employs manipulations based on vector calculus, the use of sparse matrices, the symbolic construction of probabilistic kernels, and multi-threading. Experiments show StocHy's markedly improved performance when compared to existing abstraction-based approaches: in particular, StocHy beats state-of-the-art tools in terms of precision (abstraction error) and computational effort, and finally attains scalability to large-sized models (12 continuous dimensions). StocHy is available at www.gitlab.com/natchi92/StocHy. Data or code related to this paper is available at: [31].

### **1 Introduction**

*Stochastic hybrid systems* (shs) are a rich mathematical modelling framework capable of describing systems with complex dynamics, where uncertainty and hybrid (that is, both continuous and discrete) components are relevant. Whilst earlier instances of shs have a long history, shs proper have been thoroughly investigated only from the mid 2000s, and have been most recently applied to the study of complex systems, both engineered and natural. Amongst engineering case studies, shs have been used for modelling and analysis of micro grids [29], smart buildings [23], avionics [7], automation of medical devices [3]. A benchmark for shs is also described in [10]. However, a wider adoption of shs in realworld applications is stymied by a few factors: (i) the complexity associated with modelling shs; (ii) the generality of their mathematical framework, which requires an arsenal of advanced and diverse techniques to analyse them; and (iii) the undecidability of verification/synthesis problems over shs and the curse of dimensionality associated with their approximations.

This paper introduces a new software tool - StocHy - which is aimed at simplifying both the modelling of shs and their analysis, and which targets the wider adoption of shs, also by non-expert users. With focus on the three limiting factors above, StocHy allows to describe shs by parsing or extending well-known and -used state-space models and generates a standard shs model automatically and formats it to be analysed. StocHy can (i) perform verification tasks, e.g., compute the probability of staying within a certain region of the state space from a given set of initial conditions; (ii) automatically synthesise policies (strategies) maximising this probability, and (iii) simulate the shs evolution over time. StocHy is implemented in c++ and modular making it both extendible and portable.

**Related work.** There exist only a few tools that can handle (classes of) shs. Of much inspiration for this contribution, faust<sup>2</sup> [28] generates abstractions for uncountable-state discrete-time stochastic processes, natively supporting shs models with a single discrete mode and finite actions, and performs verification of reachability-like properties and corresponding synthesis of policies. faust<sup>2</sup> is na¨ıvely implemented in matlab and lacks in scalability to large models. The modest toolset [18] allows to model and to analyse classes of continuous-time shs, particularly probabilistic hybrid automata (pha) that combine probabilistic discrete transitions with deterministic evolution of the continuous variables. The tool for stochastic and dynamically coloured petri nets (sdcpn) [13] supports compositional modelling of pha and focuses on simulation via Monte Carlo techniques. The existing tools highlight the need for a new software that allows for (i) straightforward and general shs modelling construction and (ii) scalable automated analysis.

#### **Contributions.** The StocHy tool newly enables

	- for discrete-time, continuous-space models with additive disturbances, and possibly with multiple discrete modes, we employ formal abstractions as general Markov chains or Markov decision processes [28]; StocHy improves techniques in the faust<sup>2</sup> tool by simplifying the input model description, by employing sparse matrices to manipulate the transition probabilities and by reducing the computational time needed to generate the abstractions.
	- for models with a finite number of actions, we employ interval Markov decision processes and the model checking framework in [22]; StocHy provides a novel abstraction algorithm allowing for efficient computation of the abstract model, by means of an adaptive and sequential refining of the underlying abstraction. We show that we are able to generate significantly smaller abstraction errors and to verify models with up to 12 continuous variables.
	- stochastic dynamic programming; StocHy exploits the use of symbolic kernels.

This contribution is structured as follows: Sect. 2 crisply presents the theoretical underpinnings (modelling and analysis) for the tool. We provide an overview of the implementation of StocHy in Sect. 3. We highlight features and use of StocHy by a set of experimental evaluations in Sect. 4: we provide four different case studies that highlight the applicability, ease of use, and scalability of StocHy. Details on executing all the case studies are detailed in this paper and within a Wiki page that accompanies the StocHy distribution.

### **2 Theory: Models, Abstractions, Simulations**

#### **2.1 Models - Discrete-Time Stochastic Hybrid Systems**

StocHy supports the modelling of the following general class of shs [1,4].

**Definition 1.** *A* shs *[4] is a discrete-time model defined as the tuple*

$$\mathcal{H} = (\mathbb{Q}, n, \mathcal{U}, T\_x, T\_q), \quad where \tag{1}$$


In this model the discrete component takes values in a finite set Q of modes (a.k.a. locations), each endowed with a continuous domain (the Euclidean space <sup>R</sup><sup>n</sup>). As such, a point <sup>s</sup> over the hybrid state space <sup>D</sup> is pair (q, x), where <sup>q</sup> ∈ Q and <sup>x</sup> <sup>∈</sup> <sup>R</sup><sup>n</sup>. The semantics of transitions at any point over a discrete time domain, are as follows: given a point s ∈ D, the discrete state is chosen from Tq, and depending on the selected mode q ∈ Q the continuous state is updated according to the probabilistic law Tx. Non-determinism in the form of actions can affect both discrete and continuous transitions.

*Remark 1.* A rigorous characterisation of shs can be found in [1], which introduces a general class of models with probabilistic resets and a hybrid actions space. Whilst we can deal with general shs models, in the case studies of this paper we focus on special instances, as described next.

*Remark 2 (Special instance).* In Case Study 2 (see Sect. 4.2) we look at models where actions are associated to a deterministic selection of locations, namely T<sup>q</sup> : U→Q and U is a finite set of actions.

*Remark 3 (Special instance).* In Case Study 4 (Sect. 4.4) we consider non-linear dynamical models with bilinear terms, which are characterised for any q ∈ Q by x<sup>k</sup>+1 = Aqx<sup>k</sup> + Bqu<sup>k</sup> + x<sup>k</sup> v <sup>i</sup>=1 <sup>N</sup>q,iui,k <sup>+</sup> <sup>G</sup>qwk, where <sup>k</sup> <sup>∈</sup> <sup>N</sup> represents the discrete time index, Aq, Bq, G<sup>q</sup> are appropriately sized matrices, Nq,i represents the bilinear influence of the i−th input component ui, and w<sup>k</sup> = w ∼ N (·; 0, 1) and N (·; η, ν) denotes a Gaussian density function with mean η and covariance matrix <sup>ν</sup><sup>2</sup>. This expresses the continuous kernel <sup>T</sup><sup>x</sup> : <sup>B</sup>(R<sup>n</sup>) ×D×U→ [0, 1] as

$$\mathcal{N}(\cdot; A\_q x + B\_q u + x \sum\_{i=1}^{v} N\_{q,i} u\_i + F\_q, G\_q). \tag{2}$$

In Case Study 1-2-3 (Sects. 4.1–4.3), we look at the special instance from [22], where the dynamics are autonomous (no actions) and linear: here T<sup>x</sup> is

$$\mathcal{N}(\cdot; A\_q x + F\_q, G\_q),\tag{3}$$

where in Case Studies 1, 3 Q is a single element.

**Definition 2.** *A Markov decision process (* mdp*) [5] is a discrete-time model defined as the tuple*

$$\mathcal{H} = (\mathbb{Q}, \mathcal{U}, T\_q), \quad where \tag{4}$$


Whenever the set of actions is trivial or a policy is synthesised and used (cf. discussion in Sect. 2.2) the mdp reduces to a Markov chain (mc), and a kernel T<sup>q</sup> : Q×Q→ [0, 1] assigns to each q ∈ Q a distribution over Q as Tq(·|q).

**Definition 3.** *An interval Markov decision process (* imdp*) [26] extends the syntax of an* mdp *by allowing for uncertain* Tq*, and is defined as the tuple*

$$\mathcal{H} = (\mathcal{Q}, \mathcal{U}, \check{P}, \hat{P}), \quad where \tag{5}$$


$$\square$$

*For all* q, q ∈ Q *and* <sup>u</sup> ∈ U*, it holds that* <sup>P</sup>ˇ(q <sup>|</sup>q, u) <sup>≤</sup> <sup>P</sup>ˆ(q |q, u) *and,*

$$\sum\_{q' \in \mathcal{Q}} \mathring{P}(q'|q, u) \le 1 \le \sum\_{q' \in \mathcal{Q}} \mathring{P}(q'|q, u).$$

*Note that when* <sup>P</sup>ˇ(·|q, u) = <sup>P</sup>ˆ(·|q, u)*, the* imdp *reduces to the* mdp *with* <sup>P</sup>ˇ(·|q, u) = <sup>P</sup>ˆ(·|q, u) = <sup>T</sup>q(·|q, u)*.*

#### **2.2 Formal Verification and Strategy Synthesis via Abstractions**

Formal verification and strategy synthesis over shs are in general not decidable [4,30], and can be tackled via quantitative finite abstractions. These are precise approximations that come in two main different flavours: abstractions into mdp [4,28] and into imdp [22]. Once the finite abstractions are obtained, and with focus on specifications expressed in (non-nested) pctl or fragments of ltl [5], formal verification or strategy synthesis can be performed via probabilistic model checking tools, such as prism [21], storm [12], iscasMc [17]. We overview next the two alternative abstractions, as implemented in StocHy.

**Abstractions into Markov decision processes.** Following [27], mdp are generated by either (i) uniformly gridding the state space and computing an abstraction error, which depends on the continuity of the underlying continuous dynamics and on the chosen grid; or (ii) generating the grid adaptively and sequentially, by splitting the cells with the largest local abstraction error until a desired global abstraction error is achieved. The two approaches display an intuitive trade-off, where the first in general requires more memory but less time, whereas the second generates smaller abstractions. Either way, the probability to transit from each cell in the grid into any other cell characterises the mdp matrix Tq. Further details can be found in [28]. StocHy newly provides a c++ implementation and employs sparse matrix representation and manipulation, in order to attain faster generation of the abstraction and use in formal verification or strategy synthesis.

*Verification via* mdp (when the action set is trivial) is performed to check the abstraction against non-nested, bounded-until specifications in pctl [5] or *cosafe linear temporal logic* (csltl) [20].

*Strategy synthesis via* mdp is defined as follows. Consider, the class of deterministic and memoryless Markov strategies π = (μ0, μ1,...) where μ<sup>k</sup> : Q→U. We compute the strategy π that maximises the probability of satisfying a formula, with algorithms discussed in [28].

**Abstraction into Interval Markov decision processes** (imdp) is based on a procedure in [11] performed using a uniform grid and with a finite set of actions U (see Remark 2). StocHy newly provides the option to generate a grid using adaptive/sequential refinements (similar to the case in the paragraph above) [27], which is performed as follows: (i) define a required minimal maximum abstraction error εmax; (ii) generate a coarse abstraction using the Algorithm in [11] and compute the local error ε<sup>q</sup> that is associated to each abstract state q; (iii) split all cells where ε<sup>q</sup> > εmax along the main axis of each dimension, and update the probability bounds (and errors); and (iv) repeat this process until ∀q, ε<sup>q</sup> < εmax.

*Verification via* imdp is run over properties in csltl or bounded-LTL (bltl) form using the imdp model checking algorithm in [22].

*Synthesis via* imdp [11] is carried out by extending the notions of strategies of mdp to depend on memory, that is on prefixes of paths.

#### **2.3 Analysis via Monte Carlo Simulations**

Monte Carlo techniques generate numerical sampled trajectories representing the evaluation of a stochastic process over a predetermined time horizon. Given a sufficient number of trajectories, one can approximate the statistical properties of the solution process with a required confidence level. This approach has been adopted for simulation of different types of shs. [19] applies sequential Monte Carlo simulation to shs to reason about rare-event probabilities. [13] performs Monte Carlo simulations of classes of shs described as Petri nets. [8] proposes a methodology for efficient Monte Carlo simulations of continuous-time shs. In this work, we analyse a shs model using Monte Carlo simulations following the approach in [4]. Additionally, we generate histogram plots at each time step, providing further insight on the evolution of the solution process.

### **3 Overview of StocHy**

**Installation.** StocHy is set up using the provided get dep file found within the distribution package, which will automatically install all the required dependencies. The executable run.sh builds and runs StocHy. This basic installation setup has been successfully tested on machines running Ubuntu 18.04.1 LTS GNU and Linux operating systems.

**Input interface.** The user interacts with StocHy via the main file and must specify (i) a high-level description of the model dynamics and (ii) the task to be performed. The description of model dynamics can take the form of a list of the transition probabilities between the discrete modes, and of the statespace models for the continuous variables in each mode; alternatively, a description can be obtained by specifying a path to a matlab file containing the model description in state-space form together with the transition probability matrix. Tasks can be of three kinds (each admitting specific parameters): simulation, verification, or synthesis. The general structure of the input interface is illustrated via an example in Listing 1.1: here the user is interested in simulating a shs with two discrete modes <sup>Q</sup> <sup>=</sup> {q0, q1} and two continuous variables evolve according to (3). The model is autonomous and has no control actions. The relationship between the discrete modes is defined by a fixed transition probability (line 1). The evolution of the continuous dynamics are defined in lines 2–14. The initial condition for both the discrete modes and

the continuous variables are set in lines 16–21 (this is needed for simulation tasks). The equivalent shs model is then set up by instantiating an object of type shs t<arma::mat,int> (line 23). Next, the task is defined in line 27 (simulation with a time horizon K = 32, as specified in line 25 and using the simulator library, as set in line 26). We combine the model and task specification together in line 29. Finally, StocHy carries out the simulation using the function performTask (line 31).

**Modularity.** StocHy comprises independent libraries for different tasks, namely (i) faust<sup>2</sup>, (ii) imdp, and (iii) simulator. Each of the libraries is separate and depends only on the model structure that has been entered. This allows for seamless extensions of individual sub-modules with new or existing tools and methods. The function performTask acts as multiplexer for calling any of the libraries depending on the input model and task specification.

**Data structures.** StocHy makes use of multiple techniques to minimise computational overhead. It employs vector algebra for efficient handling of linear operations, and whenever possible it stores and manipulates matrices as sparse structures. It uses the linear algebra library Armadillo [24,25], which applies multi-threading and a sophisticated expression evaluator that has been shown to speed up matrix manipulations in c++ when compared to other libraries. faust<sup>2</sup> based abstractions define the underlying kernel functions symbolically using the library GiNaC [6], for easy evaluation of the stochastic kernels.

**Output interface.** We provide outputs as text files for all three libraries, which are stored within the results folder. We also provide additional python scripts for generating plots as needed. For abstractions based on faust<sup>2</sup>, the user has the additional option to export the generated mdp or mc to prism format, to interface with the popular model checker [21] (StocHy prompts the user this option following the completion of the verification or synthesis task). As a future extension, we plan to export the generated abstraction models to the model checker storm [12] and to the modelling format jani [9].

### **4 StocHy: Experimental Evaluation**

We apply StocHy on four different case studies highlighting different models and tasks to be performed. All the experiments are run on a standard laptop, with an Intel Core i7-8550U CPU at 1.80 GHz × 8 and with 8 GB of RAM.

#### **4.1 Case Study 1 - Formal Verification**

We consider the shs model first presented in [2]. The model takes the form of (1), and has one discrete mode and two continuous variables representing the level of CO<sup>2</sup> (x1) and the ambient temperature (x2), respectively. The continuous variables evolve according to

$$x\_{1,k+1} = x\_{1,k} + \frac{\Delta}{V}(-\rho\_m x\_{1,k} + \varrho\_c (C\_{out} - x\_{1,k})) + \sigma\_1 w\_k,\tag{6}$$

$$x\_{2,k+1} = x\_{2,k} + \frac{\Delta}{C\_z}(\rho\_m C\_{pa} (T\_{set} - x\_{2,k}) + \frac{\varrho\_c}{R}(T\_{out} - x\_{2,k})) + \sigma\_2 w\_k,$$

where Δ the sampling time [min], V is the volume of the zone [m<sup>3</sup>], ρ<sup>m</sup> is the mass air flow pumped inside the room [m<sup>3</sup>/min], <sup>c</sup> is the natural drift air flow [m<sup>3</sup>/min], Cout is the outside CO<sup>2</sup> level [ppm/min], Tset is the desired temperature [<sup>o</sup>C], Tout is the outside temperature [ ◦C/min], C<sup>z</sup> is the zone capacitance [Jm<sup>3</sup>/ ◦C], Cpa is the specific heat capacity of air [J/ ◦C], R is the resistance to heat transfer [ ◦C/J], and σ(·) is a variance term associated to the noise w<sup>k</sup> ∼ N (0, 1).

We are interested in verifying whether the continuous variables remain within the safe set Xsaf e = [405, 540] × [18, 24] over 45 min (K = 3). This property can be encoded as a bltl property, ϕ<sup>1</sup> := -<sup>≤</sup><sup>K</sup>Xsaf e, where is the "*always*" temporal operator considered over a finite horizon. The semantics of bltl is defined over finite traces, denoted by <sup>ζ</sup> <sup>=</sup> {ζ<sup>j</sup>}<sup>K</sup> <sup>j</sup>=0. A trace ζ satisfies ϕ<sup>1</sup> if ∀j ≤ K, ζ<sup>j</sup> ∈ Xsaf e, and we quantify the probability that traces generated by the shs satisfy ϕ1.

*Case study 1:* Listings explaining task specification for (a) faust<sup>2</sup> and (b) imdp

```
1 // Dynamics definition
2 shs_t<arma::mat,int>
         myShs('../CS1.mat');
3 // Specification for FAUST^2
4 // safe set
5 arma::mat safe =
         {{405,540},{18,24}};
6 // max error
7 double eps = 1;
8 // grid type
9 // (1 = uniform, 2 = adaptive)
10 int gridType = 1;
11 // time horizon
12 int K = 3;
13 // task and property type
14 // (1 = verify safety , 2 =
         verify reach-avoid,
15 // 3 = safety synthesis, 4 =
         reach-avoid synthesis)
16 int p = 1;
17 // library (1 = simulator, 2 =
         faust^2, 3 = imdp)
18 int lb = 2;
19 // task specification
20 taskSpec_t
         mySpec(lb,K,p,safe,eps,gridType);
         Listing 1.2: (a) faust2
                                       // Dynamics definition
                                       shs_t<arma::mat,int>
                                           myShs('../CS1.mat');
                                       // Specification for IMDP
                                       // safe set
                                       arma::mat safe
                                           {{405,540},{18,24}};
                                       // grid size for each dimension
                                       arma::mat grid =
                                           {{0.0845,0.0845}};
                                       // relative tolerance
                                       arma::mat reft = {{1,1}};
                                       // time horizon
                                       int K = 3;
                                       // task and property type
                                       // (1 = verify safety , 2 =
                                           verify reach-avoid,
                                       // 3 = safety synthesis, 4 =
                                           reach-avoid synthesis)
                                       int p = 1;
                                       // library (1 = simulator, 2 =
                                           faust^2, 3 = imdp)
                                       int lb = 3;
                                       // task specification
                                       taskSpec_t
                                           mySpec(lb,K,p,safe,grid,reft);
                                            Listing 1.3: (b) imdp
```
When tackled with the method based on faust<sup>2</sup> that hinges on the computation of Lipschitz constants, this verification task is numerically tricky, in view of difference in dimensionality of the range of x<sup>1</sup> and x<sup>2</sup> within the safe set Xsaf e and the variance associated with each dimension <sup>G</sup><sup>q</sup><sup>0</sup> = [ <sup>σ</sup><sup>1</sup> <sup>0</sup> <sup>0</sup> <sup>σ</sup><sup>2</sup> ]=[ <sup>40</sup>.096 0 0 0.<sup>511</sup> ]. In order to mitigate this, StocHy automatically rescales the state space so all the dynamics evolve in a comparable range.

**Implementation.** StocHy provides two verification methods, one based on faust<sup>2</sup> and the second based on imdp. We parse the model from file cs1.mat (see line 2 of Listings 1.2(a) and 1.3(b), corresponding to the two methods). cs1.mat sets parameter values to (6) and uses a Δ = 15 [min]. As anticipated, we employ both techniques over the same model description:

– for faust<sup>2</sup> we specify the safe set (Xsaf e), the maximum allowable error, the grid type (whether uniform or adaptive grid), the time horizon, together with the type of property of interest (safety or reach-avoid). This is carried out in lines 5–21 in Listing 1.2(a).

**Table 1.** *Case study 1:* Comparison of verification results for ϕ<sup>1</sup> when using faust<sup>2</sup> vs imdp.


**Fig. 1.** *Case study 1:* Lower bound probability of satisfying ϕ<sup>1</sup> generated using imdp with 3481 states.

– for the imdp method, we define the safe set (Xsaf e), the grid size, the relative tolerance, the time horizon and the property type. This can be done by defining the task specification using lines 5–21 in Listing 1.3(b).

Finally, to run either of the methods on the defined input model, we combine the model and the task specification using inputSpec t<arma::mat,int> myInput(myShs,mySpec), then run the command performTask(myInput). The verification results for both methods are stored in the results directory:


**Outcomes.** We perform the verification task using both faust<sup>2</sup> and imdp, over different sizes of the abstraction grid. We employ uniform gridding for both methods. We further compare the outcomes of StocHy against those of the faust<sup>2</sup> tool, which is implemented in matlab [28]. Note that the imdp consists of |Q|+ 1 states, where the additional state is the sink state q<sup>u</sup> = D\Xsaf e. The results are shown in Table 1. We saturate (conservative) errors output that are greater than 1 to this value. We show the probability of satisfying the formula obtained from imdp for a grid size of 3481 states in Fig. 1 – similar probabilities are obtained for the remaining grid sizes. As evident from Table 1, the new imdp method outperforms the approach using faust<sup>2</sup> in terms of the

**Fig. 2.** *Case study 2:* (a) Gridded domain together with a superimposed simulation of trajectory initialised at (*−*0.5, *<sup>−</sup>*1) within <sup>q</sup>0, under the synthesised optimal switching strategy π∗. Lower probabilities of satisfying ϕ<sup>2</sup> for mode q<sup>0</sup> (b) and for mode q<sup>1</sup> (c), as computed by StocHy.

maximum error associated to the abstraction (faust<sup>2</sup> generates an abstraction error < 1 only with 4225 states). Comparing the faust<sup>2</sup> within StocHy and the original faust<sup>2</sup> implementation (running in matlab), StocHy offers computational speed-up for the same grid size. This is due to the faster computation of the transition probabilities, through StocHy's use of matrix manipulations. faust<sup>2</sup> within StocHy also simplifies the input of the dynamical model description: in the original faust<sup>2</sup> implementation, the user is asked to manually input the stochastic kernel in the form of symbolic equations in a matlab script. This is not required when using StocHy, automatically generates the underlying symbolic kernels from the input state-space model descriptions.

#### **4.2 Case Study 2 - Strategy Synthesis**

We consider a stochastic process with two modes Q = {q0, q1}, which continuously evolves according to (3) with

*A*q<sup>0</sup> = - 0*.*43 0*.*52 <sup>0</sup>*.*65 0*.*12 *, G*q<sup>0</sup> = - 1 0*.*1 0 0*.*1 *, A*q<sup>0</sup> = - 0*.*65 0*.*12 <sup>0</sup>*.*52 0*.*43 *, G*q<sup>1</sup> = - 0*.*2 0 0 0*.*2 *, F*q<sup>i</sup> <sup>=</sup> - 0 0 *,*

and i ∈ {0, 1}. Consider the continuous domain shown in Fig. 2a over both discrete locations. We plan to synthesise the optimal switching strategy π- that maximises the probability of reaching the *green* region, whilst avoiding the *purple* one, over an unbounded time horizon, given any initial condition within the domain. This can be expressed with the ltl formula, ϕ<sup>2</sup> := (¬purple) U green, where U is the "*until*" temporal operator, and the atomic propositions {purple, green} denote regions within the set <sup>X</sup> = [−1.5, <sup>1</sup>.5]<sup>2</sup> (see Fig. 2a).

**Implementation.** We define the model dynamics following lines 3–14 in Listing 1.1, while we use Listing 1.3 to specify the synthesis task and together with its associated parameters. The ltl property ϕ<sup>2</sup> is over an unbounded time horizon, which leads to employing the imdp method for synthesis (recall that the faust<sup>2</sup> implementation can only handle time-bounded properties, and its abstraction error monotonically increases with the time horizon of the formula). In order to encode the task we set the variable safe to correspond to X the grid size to 0.12 and the relative tolerance to 0.06 along both dimensions (cf. lines 5–10 in Listing 1.3). We set the time horizon K = -1 to represent an unbounded time horizon, let p=4 to trigger the synthesis engine over the given specification and make lb = 3 to use imdp method (cf. lines 12–19 in Listing 1.3). This task specification partitions the set X into the underlying imdp via uniform gridding. Alternatively, the user has the option to make use of the adaptive-sequential algorithm by defining a new variable eps max which characterise the maximum allowable abstraction error and then specify the task using taskSpec t mySpec(lb,K,p,boundary,eps max,grid,rtol);. Next, we define two files (phi1.txt and phi2.txt) containing the coordinates within the gridded domain (see Fig. 2a) associated with the atomic propositions *purple* and *green*, respectively. This allows for automatic labelling of the state-space over which synthesis is to be performed. Running the main file, StocHy generates a Solution.txt file within the results folder. This contains the synthesised π- policy, the lower bound for the probabilities of satisfying ϕ2, and the local errors ε<sup>q</sup> for any region q.

**Outcomes.** The case study generates an abstraction with a total of 2410 states, a maximum probability of 1, a maximum abstraction error of 0.21, and it requires a total time of 1639.3 [s]. In this case, we witness a slightly larger abstraction error via the imdp method then in the previous case study. This is due the nondiagonal covariance matrix G<sup>q</sup><sup>0</sup> which introduces a rotation in X within mode q0. When labelling the states associated with the regions purple and green, an additional error is introduced due to the over- and under-approximation of states associated with each of the two regions. We further show the simulation of a trajectory under πwith a starting point of (−0.5, −1) in q0, within Fig. 2a.

#### **4.3 Case Study 3 - Scaling in Continuous Dimension of Model**

We now focus on the continuous dynamics by considering a stochastic process with Q = {q0} (single mode) and dynamics evolving according to (3), characterised by A<sup>q</sup><sup>0</sup> = 0.8**I**d, F<sup>q</sup><sup>0</sup> = **0**<sup>d</sup> and G<sup>q</sup><sup>0</sup> = 0.2**I**d, where d corresponds to the number of continuous variables. We are interested in checking the ltl specification ϕ<sup>3</sup> := -<sup>X</sup>saf e, where <sup>X</sup>saf e = [−1, 1]<sup>d</sup>, as the continuous dimension <sup>d</sup> of the model varies. Here "-" is the "*always*" temporal operator and a trace ζ satisfies ϕ<sup>3</sup> if ∀k ≥ 0, ζ<sup>k</sup> ∈ Xsaf e. In view of the focus on scalability for this Case Study 3, we disregard discussing the computed probabilities, which we instead covered in Sect. 4.1.

**Implementation.** Similar to Case Study 2, we follow lines 3–14 in Listing 1.1 to define the model dynamics, while we use Listing 1.3 to specify the verification task using the imdp method. For this example, we employ a uniform grid having a grid size of 1 and relative tolerance of 1 for each dimension (cf. lines 5–10 in


**Table 2.** *Case study 3:* Verification results of the imdp-based approach over ϕ3, for varying dimension d of the stochastic process.

Listing 1.3). We set K = -1 to represent an unbounded time horizon, p=1 to perform verification over a safety property and lb = 3 to use the imdp method (cf. lines 12–19 in Listing 1.3). In Table 2 we list the number of states required for each dimension, the total computational time, and the maximum error associated with each abstraction.

**Outcomes.** From Table 2 we can deduce that by employing the imdp method within StocHy, the generated abstract models have manageable state spaces, thanks to the tight error bounds that is obtained. Notice that since the number of cells per dimension is increased with the dimension d of the model, the associated abstraction error εmax is decreased. The small error is also due to the underlying contractive dynamics of the process. This is a key fact leading to scalability over the continuous dimension d of the model: StocHy displays a significant improvement in scalability over the state of the art [28] and allows abstracting stochastic models with relevant dimensionality. Furthermore, StocHy is capable to handle specifications over infinite horizons (such as the considered *until* formula).

#### **4.4 Case Study 4 - Simulations**

For this last case study, we refer to the CO<sup>2</sup> model described in Case Study 1 (Sect. 4.1). We extend the CO<sup>2</sup> model to capture (i) the effect of occupants leaving or entering the zone within a time step (ii) the opening or closing of the windows in the zone [2]. ρ<sup>m</sup> is now a control input and is an exogenous signal. This can be described as a shs comprising two-dimensional dynamics, over discrete modes in the set {q<sup>0</sup> = (E,C), q<sup>1</sup> = (F, C), q<sup>2</sup> = (F, O), q<sup>3</sup> = (E,O)} describing possible configurations of the room (empty (E) or full (F), and with windows open (O) or closed (C)). A mc representing the discrete modes and their dynamics is in Fig. 3a. The continuous variables evolve according to Eq. (6), which now captures the effect of switching between discrete modes, as

$$x\_{1,k+1} = x\_{1,k} + \frac{\Delta}{V\_A}(-\rho\_m x\_{1,k} + \varrho\_{o,c}(C\_{out} - x\_{1,k})) + \mathbf{1}\_F C\_{occ,k} + \sigma\_1 w\_k,\tag{7}$$

$$x\_{2,k+1} = x\_{2,k} + \frac{\Delta}{C\_z} (\rho\_m C\_{pa} (T\_{set} - x\_{2,k}) + \frac{\varrho\_{o,c}}{R} (T\_{out} - x\_{2,k})) + \mathbf{1}\_F T\_{occ,k} + \sigma\_2 w\_k,$$

where the additional terms are: (·) is the natural drift air flow that changes depending whether the window is open (o) or closed (c) [m<sup>3</sup>/min]; Cocc is the generated CO<sup>2</sup> level when the zone is occupied (it is multiplied by the indicator

**Fig. 3.** *Case study 4:* (a) mc for the discrete modes of the CO<sup>2</sup> model and (b) input control signal.

function **1**<sup>F</sup> ) [ppm/min]; Tocc is the generated heat due to occupants [ ◦C/min], which couples the dynamics in (7) as Tocc,k = vx1,k + -.

**Implementation.** The provided file cs4.mat sets the values of the parameters in (7) and contains the transition probability matrix representing the relationships between discrete modes. We select a sampling time Δ = 15 [min] and simulate the evolution of this dynamical model over a fixed time horizon K = 8h (i.e. 32 steps) with an initial CO<sup>2</sup> level x<sup>1</sup> ∼ N (450, 25) [ppm] and a temperature level of x<sup>2</sup> ∼ N (17, 2) [ ◦C]. We define the initial conditions using Listing 1.4. Line 2 defines the number of Monte Carlo simulations using by the variable monte and sets this to 5000. We instantiate the initial values of the continuous variables using the term x init, while we set the initial discrete mode using the variable q init. This is done using lines 4–17 which defines independent normal distribution for each of the continuous variable from which we sample 5000 points for each of the continuous variables and defines the initial discrete mode to q<sup>0</sup> = (E,C). We define the control signal ρ<sup>m</sup> in line 20, by parsing the u.txt which contains discrete values of ρ<sup>m</sup> for each time step (see Fig. 3b). Once the model is defined, we follow Listing 1.1 to perform the simulation. The simulation

**Fig. 4.** *Case study 4:* Simulation single traces for continuous variables (a) x1, (b) x<sup>2</sup> and discrete modes (c) q. Histogram plots with respect to time step for (d) x1, (e) x<sup>2</sup> and discrete modes (f) q.

engine also generates a python script, simPlots.py, which gives the option to visualise the simulation outcomes offline.

**Outcomes.** The generated simulation plots are shown in Fig. 4, which depicts: (i) a sample trace for each continuous variable (the evolution of x<sup>1</sup> is shown in Fig. 4a, x<sup>2</sup> in Fig. 4b) and for the discrete modes (see Fig. 4c); and (ii) histograms depicting the range of values the continuous variables can be in during each time step and the associated count (see Fig. 4c for x<sup>1</sup> and Fig. 4e for x2); and a histogram showing the likelihood of being in a discrete mode within each time step (see Fig. 4f). The total time taken to generate the simulations is 48.6 [s].

### **5 Conclusions and Extensions**

We have presented StocHy, a new software tool for the quantitative analysis of stochastic hybrid systems. There is a plethora of enticing extensions that we are planning to explore. In the short term, we intend to: (i) interface with other model checking tools such as storm [12] and the modest toolset [16]; (ii) embed algorithms for policy refinement, so we can generate policies for models having numerous continuous input variables [15]; (iii) benchmarking the tool against a set of shs models [10]. In the longer term, we plan to extend StocHy such that (i) it can employ a graphical user-interface; (ii) it can allow analysis of continuous-time shs; and (iii) it can make use of data structures such as multi-terminal binary decision diagrams [14] to reduce the memory requirements during the construction of the abstract mdp or imdp.

**Acknowledgements.** The author's would also like to thank Kurt Degiorgio, Sadegh Soudjani, Sofie Haesaert, Luca Laurenti, Morteza Lahijanian, Gareth Molyneux and Viraj Brian Wijesuriya. This work is in part funded by the Alan Turing Institute, London, and by Malta's ENDEAVOUR Scholarships Scheme.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Synthesis of Symbolic Controllers: A Parallelized and Sparsity-Aware Approach**

Mahmoud Khaled1(B), Eric S. Kim<sup>2</sup>, Murat Arcak<sup>2</sup>, and Majid Zamani3,4

<sup>1</sup> Department of Electrical and Computer Engineering, Technical University of Munich, Munich, Germany

khaled.mahmoud@tum.de

<sup>2</sup> Department of Electrical Engineering and Computer Sciences,

University of California Berkeley, Berkeley, CA, USA

*{*eskim,arcak*}*@berkeley.edu <sup>3</sup> Department of Computer Science, University of Colorado Boulder, Boulder, USA

majid.zamani@colorado.edu

<sup>4</sup> Department of Computer Science, Ludwig Maximilian University of Munich, Munich, Germany

**Abstract.** The correctness of control software in many safety-critical applications such as autonomous vehicles is very crucial. One approach to achieve this goal is through "symbolic control", where complex physical systems are approximated by finite-state abstractions. Then, using those abstractions, provably-correct digital controllers are algorithmically synthesized for concrete systems, satisfying some complex high-level requirements. Unfortunately, the complexity of constructing such abstractions and synthesizing their controllers grows exponentially in the number of state variables in the system. This limits its applicability to simple physical systems.

This paper presents a unified approach that utilizes sparsity of the interconnection structure in dynamical systems for both construction of finite abstractions and synthesis of symbolic controllers. In addition, parallel algorithms are proposed to target high-performance computing (HPC) platforms and Cloud-computing services. The results show remarkable reductions in computation times. In particular, we demonstrate the effectiveness of the proposed approach on a 7-dimensional model of a BMW 320i car by designing a controller to keep the car in the travel lane unless it is blocked.

### **1 Introduction**

Recently, the world has witnessed many emerging safety-critical applications such as smart buildings, autonomous vehicles and smart grids. These applications are examples of cyber-physical systems (CPS). In CPS, embedded control

This work was supported in part by the H2020 ERC Starting Grant AutoCPS and the U.S. National Science Foundation grant CNS-1446145.

c The Author(s) 2019

T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 265–281, 2019. https://doi.org/10.1007/978-3-030-17465-1\_15

software plays a significant role by monitoring and controlling several physical variables, such as pressure or velocity, through multiple sensors and actuators, and communicates with other systems or with supporting computing servers. A novel approach to design provably correct embedded control software in an automated fashion, is via formal method techniques [10,11], and in particular *symbolic control*.

Symbolic control provides algorithmically provably-correct controllers based on the dynamics of physical systems and some given high-level requirements. In symbolic control, physical systems are approximated by finite abstractions and then discrete (a.k.a. symbolic) controllers are automatically synthesized for those abstractions, using automata-theoretic techniques [5]. Finally, those controllers will be refined to hybrid ones applicable to the original physical systems. Unlike traditional design-then-test workflows, merging design phases with formal verification ensures that controllers are certified-by-construction. Current implementations of symbolic control, unfortunately, take a monolithic view of systems, where the entire system is modeled, abstracted, and a controller is synthesized from the overall state sets. This view interacts poorly with the symbolic approach, whose complexity grows exponentially in the number of state variables in the model. Consequently, the technique is limited to small dynamical systems.

#### **1.1 Related Work**

Recently, two promising techniques were proposed for mitigating the computational complexity of symbolic controller synthesis. The first technique [2] utilizes sparsity of internal interconnection of dynamical systems to efficiently construct their finite abstractions. It is only presented for constructing abstractions while controller synthesis is still performed monolithically without taking into account the sparse structure. The second technique [4] provides parallel algorithms targeting high performance (HPC) computing platforms, but suffers from stateexplosion problem when the number of parallel processing elements (PE) is fixed. We briefly discuss each of those techniques and propose an approach that efficiently utilizes both of them.

Many abstraction techniques implemented in existing tools, including SCOTS [9], traverse the state space in a brute force way and suffer from an exponential runtime with respect to the number of state variables. The authors of [2] note that a majority of continuous-space systems exhibit a coordinate structure, where the governing equation for each state variable is defined independently. When the equations depend only on a few continuous variables, then they are said to be sparse. They proposed a modification to the traditional brute-force procedure to take advantage of such sparsity only in constructing abstractions. Unfortunately, the authors do not leverage sparsity to improve synthesis of symbolic controllers, which is, practically, more computationally complex. In this paper, we propose a parallel implementation of their technique to utilize HPC platforms. We also show how sparsity can be utilized, using a parallel implementation, during the controller synthesis phase as well.

The framework pFaces [4] is introduced as an acceleration ecosystem for implementations of symbolic control techniques. Parallel implementations of the abstraction and synthesis algorithms are introduced as computation kernels in pFaces, which are were originally done serially in SCOTS [9]. The proposed algorithms treat the problem as a data-parallel task and they scale remarkably well as the number of PEs increases. pFaces allows controlling the complexity of symbolic controller synthesis by adding more PEs. The results introduced in [4] outperform all exiting tools for abstraction construction and controller synthesis. However, for a fixed number of PEs, the algorithms still suffer from the state-explosion problem.

In this paper, we propose parallel algorithms that utilize the sparsity of the interconnection in the construction of abstraction and controller synthesis. In particular, the main contributions of this paper are twofold:


#### **2 Preliminaries**

Given two sets A and B, we denote by <sup>|</sup>A<sup>|</sup> the cardinality of A, by 2<sup>A</sup> the set of all subsets of A, by A <sup>×</sup> B the Cartesian product of A and B, and by A \ B the Pontryagin difference between the sets A and B. Set <sup>R</sup><sup>n</sup> represents the n-dimensional Euclidean space of real numbers. This symbol is annotated with subscripts to restrict it in the obvious way, e.g., R<sup>n</sup> <sup>+</sup> denotes the positive (component-wise) <sup>n</sup>-dimensional vectors. We denote by <sup>π</sup>A : <sup>A</sup> <sup>×</sup> <sup>B</sup> <sup>→</sup> <sup>A</sup> the natural projection map on A and define it, for a set C <sup>⊆</sup> A <sup>×</sup> B, as follows: <sup>π</sup>A(C) = {<sup>a</sup> <sup>∈</sup> <sup>A</sup> | ∃b∈B (a, b) <sup>∈</sup> <sup>C</sup>}. Given a map <sup>R</sup> : <sup>A</sup> <sup>→</sup> <sup>B</sup> and a set A ⊆ <sup>A</sup>, we define R(A) := - a∈A {R(a)}. Similarly, given a set-valued map Z : A <sup>→</sup> <sup>2</sup><sup>B</sup> and a set A ⊆ A, we define Z(A) := - Z(a).

a∈A We consider general discrete-time nonlinear dynamical systems given in the form of the update equation:

$$
\Sigma: x^+ = f(x, u),
\tag{1}
$$

where <sup>x</sup> <sup>∈</sup> <sup>X</sup> <sup>⊆</sup> <sup>R</sup><sup>n</sup> is a state vector and u <sup>∈</sup> U <sup>⊆</sup> <sup>R</sup><sup>m</sup> is an input vector. The system is assumed to start from some initial state <sup>x</sup>(0) = <sup>x</sup><sup>0</sup> <sup>∈</sup> <sup>X</sup> and the map f is used to update the state of the system every τ seconds. Let set X¯ be a finite partition on X constructed by a set of hyper-rectangles of identical widths

**Fig. 1.** The sparsity graph of the vehicle example as introduced in [2].

η <sup>∈</sup> <sup>R</sup><sup>n</sup> <sup>+</sup> and let set <sup>U</sup>¯ be a finite subset of <sup>U</sup>. A finite abstraction of (1) is a finite-state system Σ¯ = (X, ¯ U,T ¯ ), where T <sup>⊆</sup> X¯ <sup>×</sup>U¯ <sup>×</sup> X¯ is a transition relation crafted so that there exists a feedback-refinement relation (FRR) R ⊆ X <sup>×</sup> X¯ from Σ to Σ¯. Interested readers are referred to [8] for details about FRRs and their usefulness on synthesizing controllers for concrete systems using their finite abstractions.

For a system Σ, an update-dependency graph is a directed graph of verticies representing input variables {u1, u2, ··· , um}, state variables {x1, x2, ··· , xn}, and updated state variables {x<sup>+</sup> <sup>1</sup> , x<sup>+</sup> <sup>2</sup> , ··· , x<sup>+</sup> n }, and edges that connect input (resp. states) variables to the affected updated state variables based on map f. For example, Fig. 1 depicts the update-dependency graph of the vehicle casestudy presented in [2] with the update equation:

$$
\begin{bmatrix} x\_1^+ \\ x\_2^+ \\ x\_3^+ \end{bmatrix} = \begin{bmatrix} f\_1(x\_1, x\_3, u\_1, u\_2) \\ f\_2(x\_2, x\_3, u\_1, u\_2) \\ f\_3(x\_3, u\_1, u\_2) \end{bmatrix},
$$

for some nonlinear functions <sup>f</sup>1, f<sup>2</sup>, and <sup>f</sup><sup>3</sup>. The state variable <sup>x</sup><sup>3</sup> affects all updated state variables x<sup>+</sup> <sup>1</sup> , <sup>x</sup><sup>+</sup> <sup>2</sup> , and <sup>x</sup><sup>+</sup> <sup>3</sup> . Hence, the graph has edges connecting <sup>x</sup><sup>3</sup> to <sup>x</sup><sup>+</sup> <sup>1</sup> , <sup>x</sup><sup>+</sup> <sup>2</sup> , and <sup>x</sup><sup>+</sup> <sup>3</sup> , respectively. As update-dependency graphs become denser, sparsity of their corresponding abstract systems is reduced. The same graph applies to the abstract system Σ¯.

We sometimes refer to X¯, U¯, and T as monolithic state set, monolithic input set and monolithic transition relation, respectively. A generic projection map

$$P\_i^f: A \to \pi^i(A)$$

is used to extract elements of the corresponding subsets affecting the updated state ¯x<sup>+</sup> i . Note that <sup>A</sup> <sup>⊆</sup> <sup>X</sup>¯ := <sup>X</sup>¯<sup>1</sup> <sup>×</sup> <sup>X</sup>¯<sup>2</sup> ×···× <sup>X</sup>¯<sup>n</sup> when we are interested in extracting subsets of the state set and <sup>A</sup> <sup>⊆</sup> <sup>U</sup>¯ := <sup>U</sup>¯<sup>1</sup> <sup>×</sup>U¯<sup>2</sup> ×···×U¯m when we are interested in extracting subsets of the input set. When extracting subsets of the state set, <sup>π</sup><sup>i</sup> is the projection map <sup>π</sup>X¯*k*1×X¯*k*2×···×X¯*kK* , where <sup>k</sup>j ∈ {1, <sup>2</sup>, ··· , n}, <sup>j</sup> ∈ {1, <sup>2</sup>, ··· , K}, and <sup>X</sup>¯k<sup>1</sup> <sup>×</sup> <sup>X</sup>¯k<sup>2</sup> ×···× <sup>X</sup>¯k*<sup>K</sup>* is a subset of states affecting the updated state variable ¯x<sup>+</sup> i . Similarly, when extracting subsets of the input set, <sup>π</sup><sup>i</sup> is the projection map <sup>π</sup>U¯*p*1×U¯*p*2×···×U¯*pP* , where <sup>p</sup>i ∈ {1, <sup>2</sup>, ··· , m}, <sup>i</sup> <sup>∈</sup> {1, <sup>2</sup>, ··· , P}, <sup>U</sup>¯p<sup>1</sup> <sup>×</sup> <sup>U</sup>¯p<sup>2</sup> ×···× <sup>U</sup>¯p*<sup>P</sup>* is a subset of inputs affecting the updated state variable ¯x<sup>+</sup> i .

For example, assume that the monolithic state (resp. input) set of the system <sup>Σ</sup>¯ in Fig. <sup>1</sup> is given by <sup>X</sup>¯ := <sup>X</sup>¯<sup>1</sup> <sup>×</sup> <sup>X</sup>¯<sup>2</sup> <sup>×</sup> <sup>X</sup>¯<sup>3</sup> (resp. <sup>U</sup>¯ := <sup>U</sup>¯<sup>1</sup> <sup>×</sup> <sup>U</sup>¯2) such that for any ¯<sup>x</sup> := (¯x<sup>1</sup>, <sup>x</sup>¯<sup>2</sup>, <sup>x</sup>¯3) <sup>∈</sup> <sup>X</sup>¯ and ¯<sup>u</sup> := (¯u<sup>1</sup>, <sup>u</sup>¯2) <sup>∈</sup> <sup>U</sup>¯, one has ¯x<sup>1</sup> <sup>∈</sup> <sup>X</sup>¯1, ¯x<sup>2</sup> <sup>∈</sup> <sup>X</sup>¯2, <sup>x</sup>¯<sup>3</sup> <sup>∈</sup> <sup>X</sup>¯3, ¯u<sup>1</sup> <sup>∈</sup> <sup>U</sup>¯1, and ¯u<sup>2</sup> <sup>∈</sup> <sup>U</sup>¯2. Now, based on the dependency graph, <sup>P</sup><sup>f</sup> <sup>1</sup> (¯x) := <sup>π</sup>X¯1×X¯<sup>3</sup> (¯x) = (¯x<sup>1</sup>, <sup>x</sup>¯3) and <sup>P</sup><sup>f</sup> <sup>1</sup> (¯u) := <sup>π</sup>U¯1×U¯<sup>2</sup> (¯u) = (¯u<sup>1</sup>, <sup>u</sup>¯2). We can also apply the map to subsets of X¯ and U¯, e.g., P<sup>f</sup> <sup>1</sup> (X¯) = <sup>X</sup>¯<sup>1</sup> <sup>×</sup> <sup>X</sup>¯3, and <sup>P</sup><sup>f</sup> <sup>1</sup> (U¯) = <sup>U</sup>¯<sup>1</sup> <sup>×</sup> <sup>U</sup>¯2.

For a transition element t = (¯x, u, ¯ x¯ ) <sup>∈</sup> T, we define P<sup>f</sup> i (t) := (P<sup>f</sup> i (¯x), Pf <sup>i</sup> (¯u), πX¯*<sup>i</sup>* (¯x )), for any component i ∈ {1, <sup>2</sup>, ··· , n}. Note that for t, the successor state ¯x is treated differently as it is related directly to the updated state variable ¯x<sup>+</sup> i . We can apply the map to subsets of <sup>T</sup>, e.g., for the given updatedependency graph in Fig. 1, one has P<sup>f</sup> <sup>1</sup> (T) = <sup>X</sup>¯<sup>1</sup> <sup>×</sup> <sup>X</sup>¯<sup>3</sup> <sup>×</sup> <sup>U</sup>¯<sup>1</sup> <sup>×</sup> <sup>U</sup>¯<sup>2</sup> <sup>×</sup> <sup>X</sup>¯1.

On the other hand, a generic recovery map

$$D\_i^f: P\_i^f(A) \to 2^A,$$

is used to recover elements (resp. subsets) from the projected subsets back to their original monolithic sets. Similarly, <sup>A</sup> <sup>⊆</sup> <sup>X</sup>¯ := <sup>X</sup>¯<sup>1</sup> <sup>×</sup> <sup>X</sup>¯<sup>2</sup> ×···× <sup>X</sup>¯n when we are interested in subsets of the state set and <sup>A</sup> <sup>⊆</sup> <sup>U</sup>¯ := <sup>U</sup>¯<sup>1</sup> <sup>×</sup> <sup>U</sup>¯<sup>2</sup> ×···× <sup>U</sup>¯m when we are interested in subsets of the input set.

For the same example in Fig. 1, let ¯x := (¯x1, <sup>x</sup>¯<sup>2</sup>, <sup>x</sup>¯3) <sup>∈</sup> <sup>X</sup>¯ be a state. Now, define ¯xp := <sup>P</sup><sup>f</sup> <sup>1</sup> (¯x) = (¯x1, <sup>x</sup>¯3). We then have <sup>D</sup><sup>f</sup> <sup>1</sup> (¯xp) := {(¯x1, <sup>x</sup>¯<sup>∗</sup> <sup>2</sup>, x¯3) <sup>|</sup> x¯<sup>∗</sup> <sup>2</sup> <sup>∈</sup> <sup>X</sup>¯2}. Similarly, for a transition element t := ((¯x1, <sup>x</sup>¯<sup>2</sup>, <sup>x</sup>¯3),(¯u1, <sup>u</sup>¯2),(¯x 1, x¯ 2, x¯ <sup>3</sup>)) <sup>∈</sup> T and its projection <sup>t</sup>p := <sup>P</sup><sup>f</sup> <sup>1</sup> (t) = ((¯x1, <sup>x</sup>¯3),(¯u1, <sup>u</sup>¯2),(¯x 1)), the recovered transitions is the set D<sup>f</sup> <sup>1</sup> (tp) = {((¯x1, <sup>x</sup>¯<sup>∗</sup> <sup>2</sup>, x¯3),(¯u1, u¯2),(¯x <sup>1</sup>, x¯∗ <sup>2</sup> , <sup>x</sup>¯∗ <sup>3</sup> )) <sup>|</sup> <sup>x</sup>¯<sup>∗</sup> <sup>2</sup> <sup>∈</sup> <sup>X</sup>¯2, ¯x∗ <sup>2</sup> ∈ X¯2, and ¯x∗ <sup>3</sup> <sup>∈</sup> <sup>X</sup>¯3}.

Given a subset X <sup>⊆</sup> X¯, let [X] := D<sup>f</sup> <sup>1</sup> ◦P<sup>f</sup> <sup>1</sup> (X). Note that [X] is not necessarily equal to X. However, we have that X <sup>⊆</sup> [X]. Here, [X] over-approximates X.

For an update map <sup>f</sup> in (1), a function <sup>Ω</sup><sup>f</sup> : X¯ <sup>×</sup> U¯ <sup>→</sup> X <sup>×</sup> X characterizes hyper-rectangles that over-approximate the reachable sets starting from a set x¯ <sup>∈</sup> X¯ when the input ¯u is applied. For example, if a growth bound map (β : <sup>R</sup><sup>n</sup> <sup>×</sup> U <sup>→</sup> <sup>R</sup>n) is used, Ω<sup>f</sup> can be defined as follows:

$$
\Omega^f(\bar{x}, \bar{u}) = (x\_{lb}, x\_{ub}) := \left( -r + f(\bar{x}\_c, \bar{u}), r + f(\bar{x}\_c, \bar{u}) \right),
$$

where <sup>r</sup> <sup>=</sup> <sup>β</sup>(η/2, u), and ¯xc <sup>∈</sup> <sup>x</sup>¯ denotes the centroid of ¯x. Here, <sup>β</sup> is the growth bound introduced in [8, Section VIII]. An over-approximation of the reachable sets can then be obtained by the map O<sup>f</sup> : <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯ <sup>→</sup> <sup>2</sup>X¯ defined by:

$$O^f(\bar{x}, \bar{u}) := Q \circ \Omega^f(\bar{x}, \bar{u}),$$

where Q is a quantization map defined by:

$$Q(x\_{lb}, x\_{ub}) = \{ \bar{x}' \in \bar{X} \mid \bar{x}' \cap \{ x\_{lb}, x\_{ub} \} \neq \emptyset \},\tag{2}$$

where [[xlb, xub]] = [xlb,<sup>1</sup>, xub,<sup>1</sup>] <sup>×</sup> [xlb,<sup>2</sup>, xub,<sup>2</sup>] ×···× [xlb,n, xub,n].

We also assume that O<sup>f</sup> can be decomposed component-wise (i.e., for each dimension i ∈ {1, <sup>2</sup>, ··· , n}) such that for any (¯x, u¯) <sup>∈</sup> X¯ <sup>×</sup> U¯, O<sup>f</sup> (¯x, <sup>u</sup>¯) = n i=1 <sup>D</sup><sup>f</sup> i (O<sup>f</sup> i (P<sup>f</sup> i (¯x), P<sup>f</sup> i (¯u))), where <sup>O</sup><sup>f</sup> i : <sup>P</sup><sup>f</sup> i (X¯) <sup>×</sup> <sup>P</sup><sup>f</sup> i (U¯) <sup>→</sup> <sup>2</sup><sup>P</sup> *<sup>f</sup> <sup>i</sup>* (X¯) is an overapproximation function restricted to component i ∈ {1, <sup>2</sup>, ··· , n} of f. The same assumption applies to the underlying characterization function Ω<sup>f</sup> .

**Algorithm 1:** Serial algorithm for constructing abstractions (SA).

**Input:** X, ¯ U,O ¯ <sup>f</sup> **Output:** A transition relation T <sup>⊆</sup> X¯ <sup>×</sup> U¯ <sup>×</sup> X¯. **<sup>1</sup>** T ← ∅ ; Initialize the set of transitions **<sup>2</sup> for all** x¯ <sup>∈</sup> X¯ **do <sup>3</sup> for all** u¯ <sup>∈</sup> U¯ **do <sup>4</sup> for all** x¯ <sup>∈</sup> O<sup>f</sup> (¯x, u¯) **do <sup>5</sup>** T <sup>←</sup> T ∪ {(¯x, u, ¯ x¯ )} ; Add a new transition **6 end <sup>7</sup> end <sup>8</sup> end**

**Algorithm 2:** Serial sparsity-aware algorithm for constructing abstractions (Sparse-SA) as introduced in [2].

**Input:** X, ¯ U,O ¯ <sup>f</sup> **Output:** A transition relation T <sup>⊆</sup> X¯ <sup>×</sup> U¯ <sup>×</sup> X¯. **<sup>1</sup>** T <sup>←</sup> X¯ <sup>×</sup> U¯ <sup>×</sup> X¯ ; Initialize the set of transitions **<sup>2</sup> for all** i ∈ {1, <sup>2</sup>, ··· , n} **do <sup>3</sup>** <sup>T</sup>i <sup>←</sup> SA(P<sup>f</sup> i (X¯), P<sup>f</sup> i (U¯), O<sup>f</sup> i ) ; Transitions of sub-spaces **<sup>4</sup>** T <sup>←</sup> T <sup>∩</sup> D<sup>f</sup> i (Ti) ; Add transitions of sub-spaces **<sup>5</sup> end**

### **3 Sparsity-Aware Distributed Constructions of Abstractions**

Traditionally, constructing Σ¯ is achieved monolithically and sequentially. This includes current state-of-the-art tools, e.g. SCOTS [9], PESSOA [6], CoSyMa [7], and SENSE [3]. More precisely, such tools have implementations that serially traverse each element (¯x, u¯) <sup>∈</sup> X¯ <sup>×</sup> U¯ to compute a set of transitions {(¯x, u, ¯ x¯ ) <sup>|</sup> x¯ <sup>∈</sup> O<sup>f</sup> (¯x, u¯)}. Algorithm <sup>1</sup> presents the traditional serial algorithm (denoted by SA) for constructing Σ¯.

The drawback of this exhaustive search was mitigated by the technique introduced in [2] which utilizes the sparsity of Σ¯. The authors suggest constructing T by applying Algorithm <sup>1</sup> to subsets of each component. Algorithm <sup>2</sup> presents a sparsity-aware serial algorithm (denoted by Sparse-SA) for constructing Σ¯, as introduced in [2]. If we assume a bounded number of elements in subsets of each component (i.e., <sup>|</sup>P<sup>f</sup> i (X¯)<sup>|</sup> and <sup>|</sup>P<sup>f</sup> i (U¯)<sup>|</sup> from line 3 in Algorithm 2), we would expect a near-linear complexity of the algorithm. This is not clearly the case in [2, Figure 3] as the authors decided to use Binary Decision Diagrams (BDD) to represent transition relation T.

Clearly, representing T as a single storage entity is a drawback in Algorithm 2. All component-wise transition sets <sup>T</sup>i will eventually need to push their results into T. This hinders any attempt to parallelize it unless a lock-free data structure is used, which affects the performance dramatically.

#### **Algorithm 3:** Proposed sparsity-aware parallel algorithm for constructing discrete abstractions.

**Input:** X, ¯ U,Ω ¯ *<sup>f</sup>* **Output:** A list of characteristic sets: K := -*P p*=1 *n i*=1 K*<sup>p</sup> loc,i*. **<sup>1</sup> for all** <sup>i</sup> ∈ {1, <sup>2</sup>, ··· , n} **do <sup>2</sup> for all** <sup>p</sup> ∈ {1, <sup>2</sup>, ··· , P} **do <sup>3</sup>** K*<sup>p</sup> loc,i* ← ∅ ; Initialize local containers **<sup>4</sup> end <sup>5</sup> end <sup>6</sup> for all** <sup>i</sup> ∈ {1, <sup>2</sup>, ··· , n} **in parallel do <sup>7</sup> for all** (¯x, <sup>u</sup>¯) <sup>∈</sup> <sup>P</sup>*<sup>f</sup> <sup>i</sup>* (X¯) <sup>×</sup> <sup>P</sup>*<sup>f</sup> <sup>i</sup>* (U¯) **in parallel with index** <sup>j</sup> **do <sup>8</sup>** p = I(i, j) ; Identify target PE **<sup>9</sup>** (x*lb*, x*ub*) <sup>←</sup> <sup>Ω</sup>*<sup>f</sup>* (¯x, <sup>u</sup>¯) ; Calculate characteristics **<sup>10</sup>** K*<sup>p</sup> loc,i* <sup>←</sup> <sup>K</sup>*<sup>p</sup> loc,i* ∪ {(¯x, u, ¯ (x*lb*, x*ub*))} ; Store characteristics **<sup>11</sup> end <sup>12</sup> end**

**Fig. 2.** An example task distributions for the parallel sparsity-aware abstraction.

On the other hand, Algorithm 2 in [4] introduces a technique for constructing Σ¯ by using a distributed data container to maintain the transition set T without constructing it explicitly. In [4], using a continuous over-approximation <sup>Ω</sup><sup>f</sup> is favored as opposed to the discrete over-approximation O<sup>f</sup> since it requires less memory in practice. The actual computation of transitions (i.e., using O<sup>f</sup> to compute discrete successor states) is delayed to the synthesis phase and done on the fly. The parallel algorithm scales remarkably with respect to the number of PEs, denoted by P, since the task is parallelizable with no data dependency. However, it still handles the problem monolithically which means, for a fixed P, it will not probably scale as the system dimension n grows.

We then introduce Algorithm <sup>3</sup> which utilizes sparsity to construct Σ¯ in parallel, and is a combination of Algorithm 2 in [4] and Algorithm 2. Function <sup>I</sup> : <sup>N</sup><sup>+</sup> \ {∞} × <sup>N</sup><sup>+</sup> \ {∞} → {1, <sup>2</sup>, ··· , P} maps a parallel job (i.e., lines 9 and 10 inside the inner **parallel for-all statement**), for a component i and a tuple (¯x, u¯) with index j, to a PE with an index p <sup>=</sup> I(i, j). K<sup>p</sup> loc,i stores the characterizations of abstraction of ith component and is located in PE of index p. Collectively, K<sup>1</sup> loc,<sup>1</sup>,...,K<sup>p</sup> loc,i,...,K<sup>P</sup> loc,n constitute a distributed container that stores the abstraction of the system.

Figure 2 depicts an example of the job and task distributions for the example presented in Fig. 1. Here, we use P = 6 with a mapping I that distributes one

**Fig. 3.** Comparison between the serial and parallel algorithms for constructing abstractions of a traffic network model by varying the dimensions.

partition element of one subset P<sup>f</sup> i (X¯)×P<sup>f</sup> i (U¯) to one PE. We also assume that the used PEs have equal computation power. Consequently, we try to divide each subset P<sup>f</sup> i (X¯) <sup>×</sup> <sup>P</sup><sup>f</sup> i (U¯) into two equal partition elements such that we have, in total, 6 similar computation spaces. Inside each partition element, we indicate which distributed storage container K<sup>p</sup> loc,i is used.

To assess the distributed algorithm in comparison with the serial one presented in [2], we implement it in pFaces. We use the same traffic model presented in [2, Subsection VI-B] and the same parameters. For this example, the authors of [2] construct <sup>T</sup>i, for each component <sup>i</sup> ∈ {1, <sup>2</sup>, ··· , n}. They combine them incrementally in a BDD that represents <sup>T</sup>. A monolithic construction of <sup>T</sup> from <sup>T</sup>i is required in [2] since symbolic controllers synthesis is done monolithically. On the other hand, using K<sup>p</sup> loc,i in our technique plays a major role in reducing the complexity of constructing higher dimensional abstractions. In Sect. 4, we utilize K<sup>p</sup> loc,i directly to synthesize symbolic controllers with no need to explicitly construct T.

Figure 3 depicts a comparison between the results reported in [2, Figure 3] and the ones obtained from our implementation in pFaces. We use an Intel Core i5 CPU, which comes equipped with an internal GPU yielding around 24 PEs being utilized by pFaces. The implementation stores the distributed containers K<sup>p</sup> loc,i as raw-data inside the memories of their corresponding PEs. As expected, the distributed algorithm scales linearly and we are able to go beyond 100 dimensions in a few seconds, whereas Figure 3 in [2] shows only abstractions up to a 51 dimensional traffic model because constructing the monolithic T begins to incur an exponential cost for higher dimensions.

*Remark 1.* Both Algorithms <sup>2</sup> and <sup>3</sup> utilize sparsity of Σ to reduce the space complexity of abstractions from <sup>|</sup>X¯ <sup>×</sup> U¯<sup>|</sup> to <sup>n</sup> i=1 <sup>|</sup>P<sup>f</sup> i (X¯) <sup>×</sup> <sup>P</sup><sup>f</sup> i (U¯)|. However, Algorithm 2 iterates over the space serially. Algorithm 3, on the other hand, handles the computation over the space in parallel using P PEs.

### **4 Sparsity-Aware Distributed Synthesis of Symbolic Controllers**

Given an abstract system Σ¯ = (X, ¯ U,T ¯ ), we define the controllable predecessor map CPre<sup>T</sup> : 2X¯×U¯ <sup>→</sup> <sup>2</sup>X¯×U¯ for <sup>Z</sup> <sup>⊆</sup> <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯ by:

$$CPre^{T}(Z) = \{ (\bar{x}, \bar{u}) \in \bar{X} \times \bar{U} \mid \emptyset \neq T(\bar{x}, \bar{u}) \subseteq \pi\_{\bar{X}}(Z) \},\tag{3}$$

where T(¯x, u¯) is an interpretation of the transitions set T as a map T : <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯ <sup>→</sup> <sup>2</sup>X¯ that evaluates a set of successor states from a state-input pair. Similarly, we introduce a component-wise controllable predecessor map CPre<sup>T</sup>*<sup>i</sup>* : 2<sup>P</sup> *<sup>f</sup> <sup>i</sup>* (X¯)×<sup>P</sup> *<sup>f</sup> <sup>i</sup>* (U¯) <sup>→</sup> <sup>2</sup><sup>P</sup> *<sup>f</sup> <sup>i</sup>* (X¯)×<sup>P</sup> *<sup>f</sup> <sup>i</sup>* (U¯), for any component i ∈ {1, <sup>2</sup>, ··· , n} and any Z := P<sup>f</sup> <sup>i</sup> (Z) := <sup>π</sup>P *<sup>f</sup> <sup>i</sup>* (X¯)×<sup>P</sup> *<sup>f</sup> <sup>i</sup>* (U¯)(Z), as follows:

$$CPre^{T\_i}(\tilde{Z}) = \{ (\bar{x}, \bar{u}) \in P\_i^f(\bar{X}) \times P\_i^f(\bar{U}) \mid \emptyset \neq T\_i(\bar{x}, \bar{u}) \subseteq \pi\_{\tilde{X}\_i}(\tilde{Z}) \}. \tag{4}$$

**Proposition 1.** *The following inclusion holds for any* i ∈ {1, <sup>2</sup>, ··· , n} *and any* Z <sup>⊆</sup> X¯ <sup>×</sup> U¯*:*

$$(P\_i^f(CPre^T(Z)) \subseteq CPre^{T\_i}(P\_i^f(Z)).$$

*Proof.* Consider an element <sup>z</sup>p <sup>∈</sup> <sup>P</sup><sup>f</sup> i (CPre<sup>T</sup> (Z)). This implies that there exists <sup>z</sup> <sup>∈</sup> <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯ such that <sup>z</sup> <sup>∈</sup> CPre<sup>T</sup> (Z) and <sup>z</sup>p <sup>=</sup> <sup>P</sup><sup>f</sup> i (z). Consequently, <sup>T</sup>i(zp) <sup>=</sup> <sup>∅</sup> since <sup>T</sup>(z) <sup>=</sup> <sup>∅</sup>. Also, since <sup>z</sup> <sup>∈</sup> CPre<sup>T</sup> (Z), then <sup>T</sup>(z) <sup>⊆</sup> <sup>π</sup>X¯ (Z). Now, recall how <sup>T</sup>i is constructed as a component-wise set of transitions in line 2 in Algorithm 2. Then, we conclude that <sup>T</sup>i(zp) <sup>⊆</sup> <sup>π</sup>X¯*<sup>i</sup>* (P<sup>f</sup> i (Z)). By this, we already satisfy the requirements in (4) such that <sup>z</sup>p = (¯x, <sup>u</sup>¯) <sup>∈</sup> CPre<sup>T</sup>*<sup>i</sup>* (Z).

Here, we consider reachability and invariance specifications given by the LTL formulae ♦ψ and ψ, respectively, where ψ is a propositional formula over a set of atomic propositions AP. We first construct an initial winning set <sup>Z</sup>ψ <sup>=</sup> {(¯x, u¯) <sup>∈</sup> X¯ <sup>×</sup> U¯ <sup>|</sup> L(¯x, u¯) <sup>|</sup><sup>=</sup> ψ)}, where L : X¯ <sup>×</sup> U¯ <sup>→</sup> <sup>2</sup>AP is some labeling function. During the rest of this section, we focus on reachability specifications for the sake of space and a similar discussion can be pursued for invariance specifications.

Traditionally, to synthesize symbolic controllers for the reachability specifications ♦ψ, a monotone function:

$$\underline{G}(Z) := CPre^{T}(Z) \cup Z\_{\psi} \tag{5}$$

is employed to iteratively compute <sup>Z</sup><sup>∞</sup> <sup>=</sup> μZ.G(Z) starting with <sup>Z</sup><sup>0</sup> <sup>=</sup> <sup>∅</sup>. Here, a notation from μ-calculus is used with μ as the minimal fixed point operator and Z<sup>⊆</sup> X¯ <sup>×</sup> U¯ is the operated variable representing the set of winning pairs (¯x, <sup>u</sup>¯) <sup>∈</sup> <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯. Set <sup>Z</sup><sup>∞</sup> <sup>⊆</sup> <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯ represents the set of final winning pairs, after a finite number of iterations. Interested readers can find more details in [5] and the references therein. The transition map T is used in this fixed-point

**Algorithm 4:** Traditional serial algorithm to synthesize C enforcing the specification ♦ψ.

**Input:** Initial winning domain <sup>Z</sup>ψ <sup>⊂</sup> <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯ and <sup>T</sup> **Output:** A controller <sup>C</sup> : <sup>X</sup>¯w <sup>→</sup> <sup>2</sup>U¯ . <sup>Z</sup><sup>∞</sup> ← ∅ ; Initialize a running win-pairs set <sup>X</sup>¯w ← ∅ ; Initialize a running win-states set **3 do** <sup>Z</sup><sup>0</sup> <sup>←</sup> <sup>Z</sup><sup>∞</sup> ; Current win-pairs gets latest win-pairs <sup>Z</sup><sup>∞</sup> <sup>←</sup> CPre<sup>T</sup> (Z0) <sup>∪</sup> <sup>Z</sup>ψ ; Update the running win-pairs set <sup>D</sup> <sup>←</sup> <sup>Z</sup><sup>∞</sup> \ <sup>Z</sup><sup>0</sup> ; Separate the new win-pairs **foreach** <sup>x</sup>¯ <sup>∈</sup> <sup>π</sup>X¯ (D) *with* <sup>x</sup>¯ ∈ <sup>X</sup>¯<sup>w</sup> **do** <sup>X</sup>¯w <sup>←</sup> <sup>X</sup>¯w ∪ {x¯} ; Add new win-states C(¯x) := {u¯ <sup>∈</sup> U¯|(¯x, u¯) <sup>∈</sup> D} ; Add new control actions **<sup>10</sup> end while** <sup>Z</sup><sup>∞</sup> <sup>=</sup> <sup>Z</sup><sup>0</sup>;

computation and, hence, the technique suffers directly from the state-explosion problem. Algorithm 4 depicts a traditional serial algorithm of symbolic controller synthesis for reachability specifications. The synthesized controller is a map C : <sup>X</sup>¯w <sup>→</sup> <sup>2</sup>U¯ , where <sup>X</sup>¯w <sup>⊆</sup> <sup>X</sup>¯ represents a winning (a.k.a. controllable) set of states. Map C is defined as: <sup>C</sup>(¯x) = {u¯ <sup>∈</sup> <sup>U</sup>¯ <sup>|</sup> (¯x, <sup>u</sup>¯) <sup>∈</sup> <sup>μ</sup><sup>j</sup>(¯x) Z.G(Z)}, where <sup>j</sup>(¯x) = inf{<sup>i</sup> <sup>∈</sup> <sup>N</sup> <sup>|</sup> <sup>x</sup>¯ <sup>∈</sup> <sup>π</sup>X¯ (μ<sup>i</sup> Z.G(Z))}, and μ<sup>i</sup> Z.G(Z) represents the set of state-input pairs by the end of the ith iteration of the minimal fixed point computation.

A parallel implementation that mitigates the complexity of the fixed-point computation is introduced in [4, Algorithm 4]. Briefly, for a set Z <sup>⊆</sup> X¯ <sup>×</sup>U¯, each iteration of μZ.G(Z) is computed via parallel traversal in the complete space X¯ <sup>×</sup> U¯. Each PE is assigned a disjoint set of state-input pairs from X¯ <sup>×</sup> U¯ and it declares whether, or not, each pair belongs to the next winning pairs (i.e., <sup>G</sup>(Z)). Although the algorithm scales well w.r.t P, it still suffers from the stateexplosion problem for a fixed P. We present a modified algorithm that utilizes sparsity to reduce the parallel search space at each iteration.

First, we introduce the component-wise monotone function:

$$\underline{G}\_i(Z) := CPre^{T\_i}(P\_i^f(Z)) \cup P\_i^f(Z\_\psi),\tag{6}$$

for any i ∈ {1, <sup>2</sup>, ··· , n} and any Z <sup>∈</sup> X¯ <sup>×</sup> U¯. Now, an iteration in the sparsityaware fixed-point can be summarized by the following three steps:


$$[\underline{G}(Z)] := \bigcap\_{i=1}^{n} (D\_i^f(\underline{G}\_i(Z))).\tag{7}$$

Note that [G(Z)] is an over-approximation of the monolithic set G(Z), which we prove in Theorem 1.

(3) Now, based on the next theorem, there is no need for a parallel search in <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯ and the search can be done in [G(Z)]. More accurately, the search for new elements in the next winning set can be done in [G(Z)] \ Z.

**Theorem 1.** *Consider an abstract system* Σ¯ = (X, ¯ U,T ¯ )*. For any set* Z <sup>∈</sup> <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯*,* <sup>G</sup>(Z) <sup>⊆</sup> [G(Z)]*.*

*Proof.* Consider any element <sup>z</sup> <sup>∈</sup> <sup>G</sup>(Z). This implies that <sup>z</sup> <sup>∈</sup> <sup>Z</sup>, <sup>z</sup> <sup>∈</sup> <sup>Z</sup>ψ or z <sup>∈</sup> CPre<sup>T</sup> (Z). We show that <sup>z</sup> <sup>∈</sup> [G(Z)] for any of these cases.


*Remark 2.* An initial computation of the controllable predecessor is done component-wise in step (1) which utilizes the sparsity of Σ¯ and can be easily implemented in parallel. Only in step (3) a monolithic search is required. However, unlike the implementation in [4, Algorithm 4], the search is performed only for a subset of X¯ <sup>×</sup> U¯, which is [G(Z)] \ <sup>Z</sup>.

Note that dynamical systems pose some locality property (i.e., starting from nearby states, successor states are also nearby) and an initial winning set will grow incrementally with each fixed-point iteration. This makes the set [G(Z)]\Z relatively small w.r.t <sup>|</sup>X¯ <sup>×</sup> U¯|. We clarify this and the result in Theorem <sup>1</sup> with a small example.

#### **4.1 An Illustrative Example**

For the sake of illustrating the proposed sparsity-aware synthesis technique, we provide a simple two-dimensional example. Consider a robot described by the following difference equation:

$$
\begin{bmatrix} x\_1^+ \\ x\_2^+ \end{bmatrix} = \begin{bmatrix} x\_1 + \tau u\_1 \\ x\_2 + \tau u\_2 \end{bmatrix},
$$

**Fig. 4.** A visualization of one arbitrary fixed-point iteration of the sparsity-aware synthesis technique for a two-dimensional robot system.

**Fig. 5.** The evolution of the fixed-point sets for the robot example by the end of fixedpoint iterations 5 (left side) and 228 (right side). A video of all iterations can be found in: http://goo.gl/aegznf.

where (x<sup>1</sup>, x<sup>2</sup>) <sup>∈</sup> <sup>X</sup>¯ := <sup>X</sup>¯1×X¯<sup>2</sup> is a state vector and (u<sup>1</sup>, u<sup>2</sup>) <sup>∈</sup> <sup>U</sup>¯ := <sup>U</sup>¯1×U¯<sup>2</sup> is an input vector. Figure 4 shows a visualization of the sets related to this sparsityaware technique for symbolic controller synthesis for one fixed-point iteration. Set <sup>Z</sup>ψ is the initial winning-set (a.k.a. target-set for reachability specifications) constructed from a given specification (e.g., a region in X¯ to be reached by the robot) and Z is the winning-set of the current fixed-point iteration. For simplicity, all sets are projected on X¯ and the readers can think of U¯ as an additional dimension perpendicular to the surface of this paper.

As depicted in Fig. 4, the next winning-set G(Z) is over-approximated by [G(Z)], as a result of Theorem 1. Algorithm 4 in [4] searches for <sup>G</sup>(Z) in (X¯<sup>1</sup> <sup>×</sup> <sup>X</sup>¯2) <sup>×</sup> (U¯<sup>1</sup> <sup>×</sup> <sup>U</sup>¯2). This work suggests searching for <sup>G</sup>(Z) in [G(Z)] \ <sup>Z</sup> instead.

#### **4.2 A Sparsity-Aware Parallel Algorithm for Symbolic Controller Synthesis**

We propose Algorithm 5 to parallelize sparsity-aware controller synthesis. The main difference between this and Algorithm 4 in [4] are lines 9–12. They

**Algorithm 5:** Proposed parallel sparsity-aware algorithm to synthesize C enforcing specification ♦ψ.

**Input:** Initial winning domain <sup>Z</sup>ψ <sup>⊂</sup> <sup>X</sup>¯ <sup>×</sup> <sup>U</sup>¯ and <sup>T</sup> **Output:** A controller <sup>C</sup> : <sup>X</sup>¯w <sup>→</sup> <sup>2</sup>U¯ . **<sup>1</sup>** <sup>Z</sup><sup>∞</sup> ← ∅ ; Initialize a shared win-pairs set **<sup>2</sup>** <sup>X</sup>¯w ← ∅ ; Initialize a shared win-states set **3 do <sup>4</sup>** <sup>Z</sup><sup>0</sup> <sup>←</sup> <sup>Z</sup><sup>∞</sup> ; Current win-pairs set gets latest win-pairs **<sup>5</sup> for all** p ∈ {1, <sup>2</sup>, ··· , P} **do <sup>6</sup>** Z<sup>p</sup> loc ← ∅ ; Initialize a local win-pairs set **<sup>7</sup>** <sup>X</sup>¯ <sup>p</sup> w,loc ← ∅ ; Initialize a local win-states set **<sup>8</sup> end <sup>9</sup>** [G] <sup>←</sup> X¯ <sup>×</sup> U¯ ; Initialize [G(Z)] **<sup>10</sup> for all** i ∈ {1, <sup>2</sup>, ··· , n} **do <sup>11</sup>** [G] <sup>←</sup> [G] <sup>∩</sup> D<sup>f</sup> i (Gi(Z<sup>∞</sup>)) ; Over-approximate **<sup>12</sup> end <sup>13</sup> for all** (¯x, <sup>u</sup>¯) <sup>∈</sup> [G] \ <sup>Z</sup><sup>∞</sup> **in parallel with index** <sup>j</sup> **do <sup>14</sup>** p <sup>=</sup> I(i) ; Identify a PE **<sup>15</sup>** P osts <sup>←</sup> Q ◦ K<sup>p</sup> loc(¯x, <sup>u</sup>¯) ; Compute successor states **<sup>16</sup> if** P osts <sup>⊆</sup> <sup>Z</sup><sup>0</sup> <sup>∪</sup> <sup>Z</sup>ψ **then <sup>17</sup>** Z<sup>p</sup> loc <sup>←</sup> <sup>Z</sup><sup>p</sup> loc ∪ {(¯x, <sup>u</sup>¯)} ; Record a winning pair **<sup>18</sup>** <sup>X</sup>¯ <sup>p</sup> w,loc <sup>←</sup> <sup>X</sup>¯ <sup>p</sup> w,loc ∪ {x¯} ; Record a winning state **<sup>19</sup> if** <sup>x</sup>¯ ∈ <sup>π</sup>X¯ (Z<sup>0</sup>) **then <sup>20</sup>** C(¯x) <sup>←</sup> C(¯x) ∪ {u¯} ; Record a control action **<sup>21</sup> end <sup>22</sup> end <sup>23</sup> end <sup>24</sup> for all** p ∈ {1, <sup>2</sup>, ··· , P} **do <sup>25</sup>** <sup>Z</sup><sup>∞</sup> <sup>←</sup> <sup>Z</sup><sup>∞</sup> <sup>∪</sup> <sup>Z</sup><sup>p</sup> loc ; Update the shared win-pairs set **<sup>26</sup>** <sup>X</sup>¯w <sup>←</sup> <sup>X</sup>¯w <sup>∪</sup> <sup>X</sup>¯ <sup>p</sup> w,loc ; Update the shared win-states set **<sup>27</sup> end <sup>28</sup> while** <sup>Z</sup><sup>∞</sup> <sup>=</sup> <sup>Z</sup><sup>0</sup>;

correspond to computing [G(Z)] at each iteration of the fixed-point computation. Line 13 is modified to do the parallel search inside [G(Z)] \ Z instead of X¯ <sup>×</sup> U¯ in the original algorithm. The rest of the algorithm is well documented in [4].

The algorithm is implemented in pFaces as updated versions of the kernels GBFP and GBFP<sup>m</sup> in [4]. We synthesize a reachability controller for the robot example presented earlier. Figure 5 shows an arena with obstacles depicted as red boxes. It depicts the result at the fixed point iterations 5 and 228. The blue box indicates the target set (i.e., <sup>Z</sup>ψ). The region colored with purple indicates the current winning states. The orange region indicates [G(Z)] \ Z. The black box is the next search region which is a rectangular over approximation of the [G(Z)] \Z. We over-approximate [G(Z)] \Z with such rectangle as it is straightforward for PEs in pFaces to work with rectangular parallel jobs. The synthesis problem is solved in 322 fixed-point iterations. Unlike the parallel algorithm in [4] which searches for the next winning region inside X¯ <sup>×</sup> U¯ at each iteration, the implementation of the proposed algorithm reduces the parallel search by an average of 87% when searching inside the black boxes in each iteration.

**Fig. 6.** An autonomous vehicle trying to avoid a sudden obstacle on the highway.

### **5 Case Study: Autonomous Vehicle**

We consider a vehicle described by the following 7-dimensional discrete-time single track (ST) model [1]:

$$\begin{cases} x\_1^+ = x\_1 + \tau x\_4 \cos(x\_5 + x\_7), \\ x\_2^+ = x\_2 + \tau x\_4 \sin(x\_5 + x\_7), \\ x\_3^+ = x\_3 + \tau u\_1, \\ x\_4^+ = x\_4 + \tau u\_2, \\ x\_5^+ = x\_5 + \tau x\_6, \\ x\_6^+ = x\_6 + \frac{\tau \mu \eta}{I\_z(l\_r + l\_f)} \left( l\_f C\_{S,f}(gl\_r - u\_2 h\_{cg})x\_3 + (l\_r C\_{S,r}(gl\_f + u\_2 h\_{cg}) - l\_f C\_{S,f}(gl\_r \\ \phantom{\eta\_r^+} - u\_2 h\_{cg})\right) x\_7 - (l\_f l\_f C\_{S,f}(gl\_r - u\_2 h\_{cg}) + l\_r^2 C\_{S,r}(gl\_f + u\_2 h\_{cg})) \frac{x\_6}{x\_4}), \\ x\_7^+ = x\_7 + \frac{\tau \mu}{I\_z 4s\_f (gl\_f + \mu\_2 h\_{cg})x\_3 - (C\_{S,r}(gl\_f + u\_2 h\_{cg}) + C\_{S,f}(gl\_r \\ \phantom{\eta\_r^-} - u\_2 h\_{cg})) x\_7 + (C\_{s,r}(gl\_f + u\_2 h\_{cg})l\_r - C\_{S,f}(gl\_r - u\_2 h\_{c}g)l\_f)\frac{x\_6}{x\_4}) - x\_6, \end{cases}$$

where <sup>x</sup><sup>1</sup> and <sup>x</sup><sup>2</sup> are the position coordinates, <sup>x</sup><sup>3</sup> is the steering angle, <sup>x</sup><sup>4</sup> is the heading velocity, <sup>x</sup><sup>5</sup> is the yaw angle, <sup>x</sup><sup>6</sup> is the yaw rate, and <sup>x</sup><sup>7</sup> is the slip angle. Variables <sup>u</sup><sup>1</sup> and <sup>u</sup><sup>2</sup> are inputs and they control the steering angle and heading velocity, respectively. Input and state variables are all members of R. The model takes into account tire slip making it a good candidate for studies that consider planning of evasive maneuvers that are very close to the physical limits. We consider an update period τ = 0.1 s and the following parameters for a BMW 320i car: m = 1093 [kg] as the total mass of the vehicle, μ = 1.048 as the friction coefficient, <sup>l</sup>f = 1.156 [m] as the distance from the front axle to center of gravity (CoG), <sup>l</sup>r = 1.422 [m] as the distance from the rear axle to CoG, <sup>h</sup>cg = 0.<sup>574</sup> [m] as the height of CoG, <sup>I</sup>z = 1791.0 [kg m<sup>2</sup>] as the moment of inertia for entire mass around z axis, <sup>C</sup>S,f = 20.89 [1/rad] as the front cornering stiffness coefficient, and <sup>C</sup>S,r = 19.89 [1/rad] as the rear cornering stiffness coefficient.

To construct an abstract system Σ¯, we consider a bounded version of the state set X := [0, 84]×[0, 6]×[−0.18, <sup>0</sup>.8]×[12, 21]×[−0.5, <sup>0</sup>.5]×[−0.8, <sup>0</sup>.8]×[−0.1, <sup>0</sup>.1], a state quantization vector <sup>η</sup>X = (1.0, <sup>1</sup>.0, <sup>0</sup>.01, <sup>3</sup>.0, <sup>0</sup>.05, <sup>0</sup>.1, <sup>0</sup>.02), a input set <sup>U</sup> := [−0.4, <sup>0</sup>.4] <sup>×</sup> [−4, 4], and an input quantization vector <sup>η</sup>U = (0.1, <sup>0</sup>.5).


**Table 1.** Used HW configurations for testing the proposed technique.

**Table 2.** Results obtained after running the experiments EX<sup>1</sup> and EX2.


We are interested in an autonomous operation of the vehicle on a highway. Consider a situation on two-lane highway when an accident happens suddenly on the same lane on which our vehicle is traveling. The vehicle's controller should find a safe maneuver to avoid the crash with the next-appearing obstacle. Figure 6 depicts such a situation. We over-approximate the obstacle with the hyper-box [28, 50] <sup>×</sup> [0, 3] <sup>×</sup> [−0.18, <sup>0</sup>.8] <sup>×</sup> [12, 21] <sup>×</sup> [−0.5, <sup>0</sup>.5] <sup>×</sup> [−0.8, <sup>0</sup>.8] <sup>×</sup> [−0.1, <sup>0</sup>.1].

We run the implementation on different HW configurations. We use a local machine and instances from Amazon Web Services (AWS) cloud computing services. Table 1 summarizes those configurations. We also run two different experiments. For the first one (denoted by EX1), the goal is to only avoid the crash with the obstacle. We use a smaller version of the original state set X := [0, 50] <sup>×</sup> [0, 6] <sup>×</sup> [−0.18, <sup>0</sup>.8] <sup>×</sup> [11, 19] <sup>×</sup> [−0.5, <sup>0</sup>.5] <sup>×</sup> [−0.8, <sup>0</sup>.8] <sup>×</sup> [−0.1, <sup>0</sup>.1]. The second one (denoted by EX2) targets the full-sized highway window (84 m), and the goal is to avoid colliding with the obstacle and get back to the right lane. Table 2 reports the obtained results. The reported times are for constructing finite abstractions of the vehicle and synthesizing symbolic controllers. Note that our results outperform easily the initial kernels in pFaces which itself outperforms serial implementations with speedups up to 30000x as reported in [4]. The speedup in EX<sup>1</sup> is higher as the obstacle consumes a relatively bigger volume in the state space. This makes [G(Z)] \ Z smaller and, hence, faster for our implementation.

#### **6 Conclusion and Future Work**

A unified approach that utilizes sparsity of the interconnection structure in dynamical systems is introduced for the construction of finite abstractions and synthesis of their symbolic controllers. In addition, parallel algorithms are designed to target HPC platforms and they are implemented within the framework of pFaces. The results show remarkable reductions in computation times. We showed the effectiveness of the results on a 7-dimensional model of a BMW 320i car by designing a controller to keep the car in the travel lane unless it is blocked.

The technique still suffers from the memory inefficiency as inherited from pFaces. More specifically, the data used during the computation of abstraction and the synthesis of symbolic controllers is not encoded. Using raw data requires larger amounts of memory. Future work will focus on designing distributed datastructures that achieve a balance between memory size and access time.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Symbolic Verification

# **iRank: A Variable Order Metric for DEDS Subject to Linear Invariants**

Elvio Gilberto Amparore<sup>1</sup>, Gianfranco Ciardo<sup>2</sup>, Susanna Donatelli1(B) , and Andrew Miner<sup>2</sup>

<sup>1</sup> Dipartimento di Informatica, Universit`a di Torino, Torino, Italy *{*amparore,donatelli*}*@di.unito.it <sup>2</sup> Iowa State University, Ames, IA, USA *{*ciardo,asminer*}*@iastate.edu

**Abstract.** Finding good variable orders for decision diagrams is essential for their effective use. We consider Multiway Decision Diagrams (MDDs) encoding a set of fixed-size vectors satisfying a set of linear invariants. Two critical applications of this problem are encoding the state space of a discrete-event discrete state system (DEDS) and encoding all solutions to a set of integer constraints. After studying the relations between the MDD structure and the constraints imposed by the linear invariants, we define iRank, a new variable order metric that exploits the knowledge embedded in these invariants. We evaluate iRank against other previously proposed metrics on a benchmark of 40 different DEDS and show that it is a better predictor of the MDD size and it is better at driving heuristics for the generation of good variable orders.

**Keywords:** Decision diagrams · Variable order metrics and computation

### **1 Introduction**

Decision diagrams (DDs) are a popular data structure to encode large sets of structured data, for example vectors whose elements take values over finite domains, but it is well-known [10] that the size of the DD strongly depends on how the structure of the data (its "variables") is mapped to the structure of the DD (its "levels"). The problem of determining the association of variable(s) to levels is the "variable ordering problem" and it is known that finding an optimal order is an NP-complete problem [9] for any DD class, including binary DDs (BDDs [10]) and multiway DDs (MDDs [19]). This has given rise to a variety of metrics (to compare the effectiveness of two orders without actually building the corresponding DDs) and of heuristics (to compute sub-optimal orders, often by attempting to optimize a given metric). DDs play a central role in many system verification tools [4,11,14,20,22], where they typically support state space exploration. Tools often make use of general-purpose DD libraries [8,18,26,27]. Libraries typically support dynamic reordering to improve the current order at run-time, while the definition of an initial order (static ordering) is typically up to the verification tool, which can rely on domain knowledge. The two problems are synergistic: reordering works better if the initial order is at least fairly good.

Our research seeks to find good variable order metrics and good variable order heuristics for MDDs encoding sets of fixed-size vectors, when these vectors satisfy some linear invariants. We want to answer whether it is possible to *leverage invariant information to define effective metrics and heuristics for variable order*. Two applications where this is important are *encoding the state space of a discrete-event discrete state system* and *encoding all solutions to a set of integer constraints*. In this paper, we concentrate on the first problem, but also address a special case of the second. Specifically, we study the relationship between MDDs and linear invariants with integer coefficients, and define two new metrics, PF and iRank, and associated heuristic and meta-heuristic. PF and iRank exploit the constraint imposed by the invariants. Our evaluation shows that iRank is superior to any other metric we consider, in all experiments we performed.

We do not discuss the state-of-the art on heuristics, see [7] for a full survey, but only metrics and on how metric optimization can guide a meta-heuristic. After the necessary background in Sect. 2, Sects. 3 and 4 define the metrics PF and iRank, based on a number of observation and propositions on the relation between MDD and invariants. Section 5 experimentally evaluates the two metrics against several other metrics on 40 different models, considering thousands of variable orders. Section 6 summarizes our results and discusses future work.

### **2 Background**

Let <sup>B</sup> <sup>=</sup> {⊥, }, <sup>N</sup> and <sup>Z</sup> denote the set of booleans, natural numbers, and integers, respectively. All other sets are denoted by calligraphic letters, e.g., A.

#### **2.1 Discrete-Event Discrete-State System and Their State Space**

A *discrete-event discrete-state system* can be generally described by providing:


The *reachable* states are S*rch* = {**m** : ∃e1,...,e*<sup>n</sup>* ∈ E, **m***init*- *<sup>e</sup>*<sup>1</sup> ··· - *<sup>e</sup><sup>n</sup>***m**} and, for such a system, an *invariant* is a boolean function <sup>f</sup> : <sup>S</sup>*pot* <sup>→</sup> <sup>B</sup> with the property that it evaluates to in all reachable states: **m** ∈ S*rch* ⇒ f(**m**), while it may be either or ⊥ in the unreachable states S*pot* \ S*rch* .

We specify DEDSs as Petri nets, because of their widespread use and the large body of literature on Petri net invariants. In Petri net terminology, the evaluation of variables describes the number of *tokens* in the set P of *places* (thus the state, or *marking*, is a vector in <sup>N</sup><sup>P</sup> ), the events <sup>E</sup> correspond to the *transitions* <sup>T</sup> , while two <sup>N</sup>P×T matrices **<sup>C</sup>**<sup>−</sup> and **<sup>C</sup>**<sup>+</sup> define the system evolution. *Effect*(t, **<sup>m</sup>**) = **<sup>m</sup>**+**C**<sup>+</sup>[P, t]−**C**−[P, t] (transition firing) iff **<sup>m</sup>** <sup>≥</sup> **<sup>C</sup>**−[P, t], otherwise *Effect*(t, **m**) = ◦, i.e., t is disabled in **m**, where ≥ is interpreted componentwise. The *incidence* matrix **<sup>C</sup>** <sup>=</sup> **<sup>C</sup>**<sup>+</sup> <sup>−</sup> **<sup>C</sup>**<sup>−</sup> is the net change to the marking caused by firing transition t is **C**[P, t]. Figure 1 shows two Petri nets used as running example. Places are shown as circles, transitions as bars, and **C**<sup>−</sup> (**C**<sup>+</sup>) as incoming (outgoing) arcs for transitions with the corresponding value in **C**<sup>−</sup> (**C**<sup>+</sup>) shown on the arc (omitted if 1). The incidence matrix and initial marking are next to the nets.

<sup>A</sup> *p-flow* is a vector *<sup>π</sup>* <sup>∈</sup> <sup>Z</sup><sup>P</sup> \ {**0**} such that *<sup>π</sup><sup>T</sup>* · **<sup>C</sup>** <sup>=</sup> **<sup>0</sup>**, and its *support* is *Supp*(*π*) = {v ∈ P : *π*[v] = 0}. A p-flow *π* implies a *linear invariant* of the form <sup>∀</sup>**<sup>m</sup>** ∈ S*rch* : *<sup>π</sup><sup>T</sup>* · **<sup>m</sup>** <sup>=</sup> *<sup>π</sup><sup>T</sup>* · **<sup>m</sup>***init*, where *<sup>π</sup><sup>T</sup>* · **<sup>m</sup>***init* <sup>=</sup> *Tc*(*π*) is obviously a constant value, the *token count* of the invariant, which depends only on **m***init*. If clear from the context, *π* may refer to either a p-flow or the implied invariant.

P-flows with no negative entries are called *p-semiflows*. Let F be the set of p-flows, <sup>F</sup><sup>+</sup> the set of p-semiflows, and <sup>F</sup><sup>−</sup> <sup>=</sup> F\F<sup>+</sup> the p-flows that are not p-semiflows. Since multiplying a p-flow by a non-zero integer results in a pflow, these sets are either empty or infinite. Figure 1 shows the *minimal* p-flows (defined later) as column vectors, with the token count below the vector.

A p-semiflow *π* describes a *conservative invariant*, which implies a *bound* **m**[v] ≤ *Tc*(*π*)/*π*[v] on the number of tokens in each place v of the support of *π* for any reachable marking **m**. Column "bnd" in Fig. 1 reports these bounds. The two p-semiflows in Fig. 1(A) express the following invariants:

$$\begin{aligned} f\_1: \quad \forall \mathbf{m} \in \mathcal{S}\_{rch}, \ \mathbf{m}[P\_0] + \mathbf{m}[P\_1b] + \mathbf{m}[P\_2b] + \mathbf{m}[P\_3b] = 2 \\\ f\_2: \quad \forall \mathbf{m} \in \mathcal{S}\_{rch}, \ \mathbf{m}[P\_0] + \mathbf{m}[P\_1a] + \mathbf{m}[P\_2a] + \mathbf{m}[P\_3a] = 2. \end{aligned}$$

These in turn imply that the number of tokens in each place is bounded by 2. We assume that each place <sup>v</sup> is bounded by some <sup>n</sup>*<sup>v</sup>* <sup>∈</sup> <sup>N</sup>, and redefine <sup>S</sup>*pot* as ×*<sup>v</sup>*∈P [0, <sup>1</sup>,...,n*v*]. This ensures that <sup>S</sup>*rch* is finite and therefore can be encoded in a (large enough but finite) MDD. This is the case if the Petri net is covered by conservative invariants, i.e., each place is in the support of some p-semiflow.

Work on Petri net invariants has mainly targeted <sup>F</sup><sup>+</sup> rather than <sup>F</sup>, possibly because it is easier to compute properties, like the bounds of places, with <sup>F</sup><sup>+</sup>. On the other hand, F can be characterized by a basis (whose size equals to the dimension of the null space of **C**, thus cannot exceed the smaller of |P| and |T |), while <sup>F</sup><sup>+</sup> can only be characterized by a *minimal generator*, the smallest set of vectors that can generate its elements through non-negative integer linear combinations of its elements. It has been shown [15] that this set is finite, is unique (thus we can denote it as <sup>F</sup><sup>+</sup> *min*), and consists of all *minimal* p-semiflows

**Fig. 1.** Two Petri nets, their incidence matrices, and their p-flows.

(where a p-semiflow is minimal if the g.c.d. of its coefficients is 1 and its support does not strictly contain the support of another p-semiflow). However, <sup>F</sup><sup>+</sup> *min* may have size exponential in |P|. A classic example of this is a Petri net sequence of fork and join models with <sup>n</sup> + 1 transitions and 2<sup>n</sup> + 1 places whose <sup>F</sup><sup>+</sup> *min* has size 2*<sup>n</sup>*. Figure 1(B) shows the case n = 3. The reader can find in [15] full details and a thorough analysis of the cost of computing <sup>F</sup><sup>+</sup> *min*.

In addition to <sup>S</sup>*rch* , we can define <sup>S</sup>*sat* <sup>=</sup> {**<sup>m</sup>** <sup>∈</sup> <sup>N</sup><sup>P</sup> : <sup>∀</sup>*<sup>π</sup>* ∈ F, **<sup>m</sup>** · *<sup>π</sup>* <sup>=</sup> *Tc*(*π*)}. Obviously S*rch* ⊆ S*sat*. We let S refer to either when the distinction is not relevant. Note that S*sat* is a superset of the *linearized reachability set* [21] {**<sup>m</sup>** <sup>∈</sup> <sup>N</sup><sup>P</sup> : <sup>∃</sup> **<sup>y</sup>** <sup>∈</sup> <sup>N</sup><sup>T</sup> , **<sup>m</sup>** <sup>=</sup> **<sup>m</sup>***init* <sup>+</sup> **<sup>C</sup>** · **<sup>y</sup>***<sup>T</sup>* }, used in Petri net theory to devise a semi-decidable procedure for safety properties.

#### **2.2 Multiway Decision Diagrams**

**Definition 1 (MDD).** Given a *global domain* <sup>X</sup> <sup>=</sup> ×*<sup>L</sup> <sup>k</sup>*=1X*k*, where each *local domain* <sup>X</sup>*<sup>k</sup>* is of the form {0, <sup>1</sup>,...,n*k*} for some <sup>n</sup>*<sup>k</sup>* <sup>∈</sup> <sup>N</sup>, an (ordered, quasireduced) MDD over X is a directed acyclic graph with exactly two terminal nodes, and ⊥, at *level* 0 (we write .*lvl* = ⊥.*lvl* = 0), with each non-terminal node p at some level p.*lvl* = k ∈ {1,...,L} having one outgoing edge for each i ∈ X*k*, pointing to a node p[i] at level k−1 or to ⊥, and with no *duplicates* (there cannot be nodes p and q at level k with p[i] = q[i] for all i ∈ X*k*) or *redundant* nodes (node p at level k is redundant if p[0] = p[i] for all i ∈ X*k*) pointing to <sup>⊥</sup>. The function <sup>f</sup>*<sup>p</sup>* : X → <sup>B</sup> encoded by an MDD node <sup>p</sup> is recursively defined as f*p*(i1,...,i*L*) = f*p*[*ik*](i1,...,i*L*) if p.*lvl* = k > 0, and f*p*(i1,...,i*L*) = p if p.*lvl* = 0. Interpreting f*<sup>p</sup>* as an indicator function, p also encodes the set S*<sup>p</sup>* ⊆ X , defined as S*<sup>p</sup>* = {(i1,...,i*L*) : f*p*(i1,...,i*L*)}. This is the set of variable assignments compatible with the paths from p to .

**Fig. 2.** P-semiflows and MDD for two variable orders for the net in Fig. 1(A).

MDDs are a *canonical* representation of subsets of X : given MDD nodes p and q at the same level, S*<sup>p</sup>* = S*<sup>q</sup>* iff p = q. We observe that quasi-reduced MDDs differ from the more common *fully-reduced* MDDs, which allow edges to skip levels by eliminating all redundant nodes, not just those encoding ⊥. As it will be clear, though, the quasi-reduced MDD encoding the state space of a Petri net covered by invariants *cannot contain redundant nodes*, thus coincides with the fully-reduced MDD for such models. When drawing MDDs, edges point down and we omit node ⊥, edges pointing to it, and the corresponding cells in the originating node, so that, if node p at level k with X*<sup>k</sup>* = {0,..., 4} is drawn as 2 3 , it means that p[0] = p[1] = p[4] = ⊥. We also omit node and edges pointing to it, but not the corresponding cell in the originating node.

MDDs have been successfully employed to generate and store the reachable state space of DEDSs, in particular Petri nets, using fixpoint *symbolic* iterations. The MDD representation of a state space S*rch* is computed as the least fixpoint of the equation <sup>Z</sup> <sup>=</sup> Z∪{**m***init*}∪{**m** : **<sup>m</sup>** ∈Z∧∃<sup>e</sup> ∈ E, **<sup>m</sup>** *<sup>e</sup>* **m** }, while the generation of S*sat* simply needs to consider one flow (and associated invariant) at a time, thus can be achieved by performing exactly |F| − 1 intersections of the sets of assignments satisfying each individual constraint.

Since we focus on the size of the MDD encoding S, we only consider MDDs with a single root node r, so that S*<sup>r</sup>* = S. Letting N*<sup>k</sup>* be the set of MDD nodes al level k, we characterize the MDD size in terms of its nonterminal nodes N , i.e., |N | = -*L <sup>k</sup>*=1 |N*k*| (although, unlike for BDDs [10] where nodes have exactly two outgoing edges, the number of MDD edges -*L <sup>k</sup>*=1 |{(p, i) : p ∈ N*k*, p[i] = ⊥}| could also be a meaningful measure of size). The first step to generate S is to map the places P of the Petri net to the L levels of the MDD. We limit ourselves to mapping each place to a different level, i.e., requiring a *variable order* λ : P→{1,...,L}, where L = |P|. It is known that the choice of λ can exponentially affect the size of MDD and finding an optimal mapping is NP-complete [9]. We stress that we consider only the *final* size of the MDD. In reality, the fixpoint iterations to compute S*rch* or the intersections to compute S*sat* can lead to an intermediate size of the MDD (*peak* size) that is normally much larger than the final size. However, our work to reduce the final MDD size is largely orthogonal to other strategies (like *saturation* [13] for S*rch* construction) aimed at reducing the peak size, thus both can be employed to improve efficiency.

The MDDs in Fig. 2 encode S*rch* for the Petri net of Fig. 1(A), for two different variable orders. More precisely Fig. 2 shows, left to right, and for each order, the variable order (with level L at the top), the place bounds, the p-semiflows F+ *min* (with the token count at the bottom), and the corresponding MDDs. The variable order in (B) is poor, resulting in an MDD with 40 nodes, while that in (A) requires only 19 nodes.

#### **2.3 Metrics for Variable Orders**

A metric M is a *perfect* predictor of MDD size if M(λ1) ≤ M(λ2) implies |N (λ1)| ≤ |N (λ2)| for any variable orders λ<sup>1</sup> and λ2, where N (λ) is the number of nodes in the MDD for S when using variable order λ; no efficiently-computable perfect predictor is known. Metrics have been defined based on the *span of events* in the incidence matrix **C**, on the *bandwidth* of **C**, on the *center of gravity* of events, and on p-semiflows. Metrics that consider the span of each event t (distance between the top and bottom nonzero in **<sup>C</sup>**−[P, t] or **<sup>C</sup>**<sup>+</sup>[P, t] for the given variable order) are the Normalized sum of Event Span (NES), the Weighted NES (WES), Sum of Span (SOS) [24], Sum of Tops (SOT) [11] and Sum of Unique and Productive Spans (SOUPS) [25]. Classic bandwidth reduction techniques from linear algebra were applied to variable order computation for the first time in [23]. The corresponding metrics are Bandwidth (BW), Profile (PROF), or Wavefront (WF), computed on a squared matrix derived from the incidence matrix **C**. Point-transition spans (PTS) is the metric used as a convergence criterion by the widely used heuristic Force [3], an algorithm for multi-dimensional clustering of graphs that has been adapted to variable order generation. A center of gravity for the variables is defined and the orders are measured in terms of *hyperdistance* of the variable from the center of gravity. PTS*<sup>P</sup>* [6] is a variation of PTS to consider also the effect of p-semiflows in the PTS variable clustering. Finally, the p-semiflow span (PSF) is the metric optimized by the heuristic defined in [5], which works by ordering the variables according to p-semiflows. PSF is a measure of the proximity of places that belong to the same p-semiflow.

An overview of these metrics can be found in [6], which also studies their coefficient of correlation to determine the predictive power of each metric over a large set of models and of orders. All models in the study are Petri nets, mostly conservative. We now provide some details for SOUPS and PSF which, together with PTS*<sup>P</sup>* , have been reported as valuable predictors [6].

SOUPS modifies the *sum of transition spans* (SOS) metric [24] by considering only once the maximal common portion of multiple transition spans having the same effect on the marking and avoids counting the bottom portion of a transition span if it checks but does not change the marking of the corresponding places. SOUPS performs particularly well in conjunction with saturation [13], as it tends to result in even smaller peak MDDs. SOS and SOUPS, just like WES and NES [24] or SOT [11], are easily computed from the matrices **C**<sup>−</sup> and **C**<sup>+</sup>.

PSF is computed analogously to SOS, but considering p-semiflow spans instead of transition spans:

$$\text{PSF}(\lambda) = \sum\_{\pi \in \mathcal{F}\_{min}^+} \left( \max \{ \lambda(v) : \pi[v] \neq 0 \} - \min \{ \lambda(v) : \pi[v] \neq 0 \} + 1 \right).$$

In our figures, the column for p-flow *π* has a dark cell with *π*[v] in it for each place v in *Supp*(*π*), a light empty cell for each place not in the support but bracketed by places in the support, and a white empty cell for the remaining places not in the support. With this notation, PSF is just the count of the number of non-white squares in the matrix of <sup>F</sup><sup>+</sup> *min*.

There has been a proposal [16] to use of p-semiflows to *eliminate* some state variables (decision diagram levels) through a greedy heuristic, but later work [12] observed that this leads to a loss of *locality* in the MDD representation of the transition relation, and suggested instead to use p-semiflows to *merge* variables, proving that this always reduces the MDD size. The same paper [12] also proposed to modify the sum-of-transition-tops (SOT) metric so that it considers also a set of linearly independent p-semiflows, but provided no hints about the relative weight given to transitions vs. p-semiflows when computing the metric.

#### **3 MDD and Invariants: The PF Metric**

We now begin investigating the relationship between p-flows and the shape of the MDD encoding S*rch* and S*sat*, and introduce the new metric PF.

**P-flows and information remembered at level** k**.** The invariant corresponding to a p-flow *π* imposes a constraint on the reachable markings, since it implies a constant weighted sum of the tokens in the places belonging to *Supp*(*π*). Thus, the MDD must "remember" (using distinct nodes at level k) the possible partial weighted sums corresponding to places in the invariant support that are above level k, as long as the invariant is *active*, i.e., its support contains places mapped to levels k or below, and this is true even if the place mapped to level k is not in the support. Thus, intuitively, places in the support should be mapped to levels close to each other. This can be easily seen seen in Fig. 2(B), where the places in the support of the two p-semiflows in <sup>F</sup><sup>+</sup> *min* are not in consecutive levels, resulting in more nodes: the level for P2b has 9 nodes, since the MDD must remember the partial sum of tokens of the places in the two branches of the Petri net of Fig. 1(A), and each of them can range from 0 to 2. In the order of Fig. 2(A), all places in the top branch are instead above the level of place P0, which is in turn above all places in the bottom branch. Thus, level P<sup>0</sup> has only three possible values to remember: whether in the top (and thus in the bottom) branch there are 0, 1, or 2 tokens (and therefore P<sup>0</sup> has 2, 1, or 0 tokens, respectively). This dependence is captured by the metric PSF of Sect. 2.3. The PSF value for order (B) is 13, while it is 8 for order (A), consistent with the intuition that a smaller value of PSF results in a smaller MDD.

**P-flows and singletons.** The token count of an invariant *π* determines a *single* possible value for the number of tokens in the level "completing" *π* (the lowest level corresponding to a place in *Supp*(*π*)), which can then only contain *singletons* (nodes with a single outgoing edge). This is the case for level P<sup>0</sup> in the MDD of Fig. 2(A) and P<sup>0</sup> in the MDD of Fig. 2(B). Interestingly, level P3a in the MDD of Fig. 2(B) also contains only singletons. This is due to an invariant generated by a p-flow in F−:

$$\pi\_3: \quad \mathbf{m}[P\_1a] + \mathbf{m}[P\_2a] + \mathbf{m}[P\_3a] - \mathbf{m}[P\_1b] - \mathbf{m}[P\_2b] - \mathbf{m}[P\_3b] = 0,$$

As p-flows in F<sup>−</sup> have similar implications on the MDD structure as those in <sup>F</sup><sup>+</sup>, we define a new metric PF, by extending PSF to consider also non-positive p-flows. Give a set of p-flows F*min*, we can then define:

$$\text{PF}(\lambda) = \sum\_{\pi \in \mathcal{F}\_{\text{min}}} \left( \max \{ \lambda(p) : \pi(p) \neq 0 \} - \min \{ \lambda(p) : \pi(p) \neq 0 \} + 1 \right),$$

But what is an appropriate choice for F*min*? To have a consistent definition of the metric we need F*min* to be uniquely and appropriately defined. While p-semiflows are characterized by a unique generator set <sup>F</sup><sup>+</sup> *min*, p-flows can be characterized by a *basis*, but the choice of basis is not unique and can lead to meaningless value of PF (for example if we choose a basis where each p-flow has the same span over the places, so that any variable order results in the same value for the PF metric).

Continuing the analogy with PSF, we define F*min* as the set of minimal pflows, i.e., the g.c.d. of their entries is 1 and their support does not strictly include the support of any other p-flow; in addition, to avoid considering both a p-flow and its negative, we assume an arbitrary place order (unrelated to the MDD variable order) and require the first nonzero entry to be positive. We now prove that this set F*min* is unique and that it can generate a multiple of any p-flow. In the figures, the set <sup>F</sup>*min* is shown partitioned into <sup>F</sup><sup>+</sup> *min* and F<sup>−</sup> *min* <sup>=</sup> <sup>F</sup>*min*\F<sup>+</sup> *min*.

**Theorem 1.** Set F*min* is unique, and it spans all p-flow directions, i.e., given *<sup>π</sup>* ∈ F, for some <sup>a</sup> <sup>∈</sup> <sup>Z</sup>, <sup>a</sup>*<sup>π</sup>* equals a linear combination of elements in <sup>F</sup>*min*.

*Proof.* To prove uniqueness, it suffices to show that there can be at most one minimal p-flow with a given support. Assume by contradiction that there are two distinct minimal p-flows *π*<sup>1</sup> and *π*<sup>2</sup> with *Supp*(*π*1) = *Supp*(*π*2) = Q, and let a<sup>1</sup> > 0 and a<sup>2</sup> > 0 be the coefficients in *π*<sup>1</sup> and *π*<sup>2</sup> corresponding to the first place v ∈ Q, respectively. Then, define *π* = a2*π*<sup>1</sup> − a1*π*2, so that *π*[v] = 0.

If *π* = **0**, then *π* ∈ F but *Supp*(*π*) ⊆ Q\{v}, thus *π*<sup>1</sup> and *π*<sup>2</sup> cannot be minimal p-flows since their support strictly contains the support of *π*, a contradiction.

If *π* = **0**, then a2*π*<sup>1</sup> = a1*π*2, which implies a<sup>2</sup> = a1, since the g.c.d. of both *π*<sup>1</sup> and *π*<sup>2</sup> is 1. But then, *π*<sup>1</sup> = *π*2, again a contradiction.

To prove that F*min* spans all p-flow directions, consider *π* <sup>1</sup> ∈ F. There must exist *π*<sup>1</sup> ∈ F*min* with *Supp*(*π*1) ⊆ *Supp*(*π* <sup>1</sup>); pick v ∈ *Supp*(*π*1) and let a<sup>1</sup> = *π*1[v] and b<sup>1</sup> = *π* <sup>1</sup>[v], so that a1*π* <sup>1</sup> = b1*π*<sup>1</sup> + *π* <sup>2</sup>, with *Supp*(*π* <sup>2</sup>) ⊆ *Supp*(*π* <sup>1</sup>) \ {v}. Either *Supp*(*π* <sup>2</sup>) = ∅, or it is a p-flow, in which case we can repeat the process to obtain a2*π* <sup>2</sup> = b2*π*<sup>2</sup> + *π* <sup>3</sup>, and so on. Eventually, we must reach the case *Supp*(*π <sup>n</sup>*+1) = ∅, i.e., *π <sup>n</sup>*+1 = **0**, at which point we can write a<sup>1</sup> ··· a*nπ* <sup>1</sup> = b1a<sup>2</sup> ··· a*nπ*<sup>1</sup> +b2a<sup>3</sup> ··· a*nπ*<sup>2</sup> +···+b*nπn*, where *π*1,...,*π<sup>n</sup>* ∈ F*min*, i.e., we can express a multiple of *π* <sup>1</sup> as a linear combination of elements of F*min*.

We observe that the size of <sup>F</sup>*min*, like that of <sup>F</sup><sup>+</sup> *min*, is at most exponential in |P|, since the proof of Theorem 1 shows that the elements of F*min* must have uncomparable supports.

### **4 MDD and Invariants: The iRank Metric**

As we shall see in Sect. 5, both PSF and PF exhibit significant correlation with the MDD size. However, there are cases where they do not perform well, especially when F*min* is large. Consider for example the Petri net of Fig. 1(B), and the three MDDs corresponding to different variable orders in Fig. 3. This Petri net has many minimal p-flows, |F<sup>+</sup> *min*| = 8 and |F*min*| = 11. The three pflows in F<sup>−</sup> *min* relate the places inside each fork-and-join subnet (P*i*a = P*i*b, for <sup>i</sup> = 1, <sup>2</sup>, 3), while the eight p-semiflows in <sup>F</sup><sup>+</sup> *min* relate the tokens in the three fork-and-join subnets with those in place P0. The order in Fig. 3(B) produces the smallest MDD size (37 nodes against the 49 nodes of order (A) and 69 of (C)), but it is the one with the worst (highest) value of PSF. On the other side also PF fails to chose the order with the smallest MDD: the smallest value for PF is 55 for order (A), which is only the second best for MDD size. One reason is that, when F*min* contains many related, dependent constraints affecting a given MDD level, counting all of them may confuse the metric. On the other hand, we have seen that considering instead a basis depends strongly on the choice of vectors included in the basis, with a meaningless metric in the worst case.

We then propose iRank, a new variable order metric which, like PSF and PF, is based on linear invariants but, unlike PSF and PF, *is unaffected by redundant minimal p-flows* and *is independent of the choice of the specific p-flows* being considered, as long as they constitute a generator set. iRank focuses on the number ρ(k) of *linearly independent partial p-flows* that are still active at level k. The definition of iRank requires a deeper understanding of the relationships among the MDD structure and the p-flows, as illustrated next.

**Fig. 3.** Three variable orders for the Petri net of Fig. 1(B), and the resulting MDDs.

Given an MDD with root node r and two MDD nodes p, q = ⊥ with p.*lvl* = k and q.*lvl* = h, let A*<sup>p</sup>* (for above) be the set of paths from r to p, B*<sup>p</sup>* (for below) the set of paths from p to , and C*p,q* the set of paths from p to q:

$$\begin{aligned} \mathcal{A}\_p &= \{ (i\_L, \dots, i\_{k+1}) : r[i\_L] \cdot \dots [i\_{k+1}] = p \} \\ \mathcal{B}\_p &= \{ (i\_k, \dots, i\_1) : p[i\_k] \cdot \dots [i\_1] = \top \} \\ \mathcal{C}\_{p,q} &= \{ (i\_k, \dots, i\_{h+1}) : p[i\_k] \cdot \dots [i\_{h+1}] = q \}, \end{aligned}$$

thus A*<sup>r</sup>* = B = {()}, A = B*<sup>r</sup>* = S*r*, A*<sup>p</sup>* = C*r,p*, B*<sup>p</sup>* = C*p,*, C*p,q* = ∅ if q.*lvl* ≥ p.*lvl*. When using an MDD to store S with a given variable order λ, the sets of paths defined by A*p*, B*p*, and C*p,q* also denote sets of submarkings, by interpreting i*<sup>k</sup>* as the number of tokens in place v = λ−<sup>1</sup>(k), and so on.

**Theorem 2** [28]**.** The nodes at level <sup>k</sup> can be used to define a partition of <sup>S</sup>*r*: *<sup>p</sup>*∈N*k*A*<sup>p</sup>* × B*<sup>p</sup>* <sup>=</sup> <sup>S</sup>*r*, and <sup>∀</sup>p, q∈N*k*, p <sup>=</sup> <sup>q</sup> ⇒ A*p*×B*<sup>p</sup>* ∩ A*q*×B*<sup>q</sup>* <sup>=</sup> <sup>∅</sup>.

We can relate MDD nodes and the p-flows by proving that all submarkings described by C*p,q*, therefore by A*p*, have the same partial sum for any given pflow. Given nodes p and q with p.*lvl* = k > q.*lvl* = h, σ = (i*k*,...,i*h*+1) ∈ C*p,q*, and a p-flow *π* ∈ F, we let the partial sum of submarking σ for invariant *π* be:

$$\operatorname{Sum}(p,q,\sigma,\pi) = \sum\_{p.lvl} {}\_{j>q.lvl} i\_j \cdot \pi[\lambda^{-1}(j)].$$

In particular, for any σ ∈ C*r,* = S*rch* , we have *Sum*(r, , σ,*π*) = *Tc*(*π*).

We can now introduce two fundamental properties enjoyed by an MDD encoding a state space subject to a set of p-flows F, which will pave the way to the definition of our new metric called iRank.

**Theorem 3.** Assume a set of states S subject to the set of p-flows F is encoded by an MDD rooted at r. Then, all paths between a given pair of nodes have the same partial sum for any given invariant: ∀σ, σ ∈ C*p,q*, ∀*π* ∈ F, *Sum*(p, q, σ,*π*) = *Sum*(p, q, σ ,*π*). We can therefore write *Sum*(p, q,*π*).

*Proof.* Consider two nodes p and q, with p.*lvl* = k > q.*lvl* = h, two paths σ and σ from p to q, and any σ*<sup>a</sup>* ∈ A*<sup>p</sup>* and σ*<sup>b</sup>* ∈ B*q*, so that both (σ*a*, σ, σ*b*) and (σ*a*, σ , σ*b*) describe markings in S. Then, for any p-flow *π* ∈ F, we have that *Sum*(r, ,(σ*a*, σ, σ*b*),*π*) = *Sum*(r, ,(σ*a*, σ , σ*b*),*π*) = *Tc*(*π*). However, *Sum*(r, ,(σ*a*, σ, σ*b*),*π*) = *Sum*(r, p, σ*a*,*π*)+*Sum*(p, q, σ,*π*)+*Sum*(q, , σ*b*,*π*) = *Sum*(r, ,(σ*a*, σ , σ*b*),*π*) = *Sum*(r, p, σ*a*,*π*)+*Sum*(p, q, σ ,*π*)+*Sum*(q, , σ*b*,*π*), thus we must have *Sum*(p, q, σ,*π*) = *Sum*(p, q, σ ,*π*).

An even stronger property holds if the MDD encodes S*sat*: then, every node in the MDD is completely identified by a unique pattern of partial p-flow sums.

**Theorem 4.** Let S*sat* be encoded by an MDD rooted at r. Then, the nodes at level k have different partial sums:

$$\forall p, p' \in \mathcal{N}\_k, \ p \neq p' \Rightarrow \exists \pi \in \mathcal{F}, \ sum(r, p, \pi) \neq \ sum(r, p', \pi).$$

*Proof.* Remember that <sup>S</sup>*sat* <sup>=</sup> {**<sup>m</sup>** <sup>∈</sup> <sup>N</sup><sup>P</sup> : <sup>∀</sup>*<sup>π</sup>* ∈ F, **<sup>m</sup>** · *<sup>π</sup>* <sup>=</sup> *Tc*(*π*)}. Assume that distinct nodes p, p ∈ N*<sup>k</sup>* satisfy ∀*π* ∈ F : *Sum*(r, p,*π*) = *Sum*(r, p ,*π*). Since the MDD is canonical, p and p must encode different sets, thus there must be a σ in B*<sup>p</sup>* \ B*p* or in B*p*- \ B*<sup>p</sup>* (w.l.o.g. assume the former case). Then, considering any σ*<sup>a</sup>* ∈ A*<sup>p</sup>* and σ *a* ∈ A*p*- , we have (σ*a*, σ) ∈ S*sat* and (σ *<sup>a</sup>*, σ) ∈ S*sat*. But (σ*a*, σ) ∈ S*sat* implies ∀*π* ∈ F, *Sum*(r, ,(σ*a*, σ),*π*) = *Tc*(*π*) and, since *Sum*(r, ,(σ*a*, σ),*π*) = *Sum*(r, p, σ*a*,*π*) + *Sum*(p, , σ,*π*) = *Sum*(r, p , σ *<sup>a</sup>*,*π*) + *Sum*(p, , σ,*π*) = *Sum*(r, ,(σ *<sup>a</sup>*, σ),*π*), and this holds for any *π* in F, we should also have (σ *<sup>a</sup>*, σ) ∈ S*sat*, a contradiction.

**Fig. 4.** Computations of the rank weights from a matrix **F** with rank(**F**) = 4.

Theorem 4 implies that every node in the MDD encoding S*sat* is completely identified by a unique pattern of partial p-flow sums. However, not every p-flow is relevant at a given level k of the MDD, and, more importantly, the portions from level L to level k of different p-flows may encode the same information, i.e., may be *linearly dependent*, yet these redundant portions contribute to the computation of the PF metric. iRank, then, attempts to estimate the number of *possible* combinations of partial path sums that may actually be found in the nodes at level k of the MDD, taking into account these linear dependencies.

To this end, we consider the |P|×|F*min*| matrix **F** (rows ordered according to λ, columns in any order) describing the p-flows in F*min*, and define the number ρ*up*(k) of linearly independent partial p-flows up to level k:

$$\rho\_{up}(k) = \text{rank}\{\mathbf{F}[L:k+1,\cdot] \},$$

where **F**[L : k + 1, ·] is the submatrix of **F** with rows L through k + 1 (level k is excluded because we are counting the partial sums *reaching* level k). ρ*up*(k) counts both p-flows active at level k and those that are not, as the lowest place in their support is mapped to a level above k (p-flow already "closed" at level k). The number ρ*down*(k) of linearly independent closed p-flows at level k is obtained by subtracting the rank of submatrix **F**[k : 1, ·] from the rank of the entire matrix **F**:

$$
\rho\_{down}(k) = \text{rank}(\mathbf{F}) - \text{rank}(\mathbf{F}[k:1, \cdot]).
$$

Then, the value we are seeking is the difference of these two quantities:

$$
\rho(k) = \rho\_{up}(k) - \rho\_{down}(k).
$$

Figure 4 depicts the definition of ρ*up*(k) and ρ*down*(k). The rectangles in the invariant matrix **F** represents the portions used to compute the ranks for level k. The values of ρ*up*(k), ρ*down*(k), and ρ(k), for all levels k, are listed on the right.

The value of the iRank metric is then the sum of all the ρ(k) values:

$$\mathbf{i}\_{\text{Rank}} = \sum\_{1 \le k \le L} \rho(k),$$

which can be thought of as an estimate of the number of independent factors affecting the number of MDD nodes at the various levels. Thus, we should expect that a linear increase in iRank implies an exponential increase in the MDD size. The main advantage of iRank is that it does not suffer in the presence of an excessive number of p-flows (as do PF and PSF). Indeed, since the metric is computed on the rank of **F** and on the rank of sets of rows of **F**, and since these ranks do not change while adding linear combinations of p-flows (larger **F**) or by removing p-flows (smaller **F**) as long as we remove only linear dependent vectors, we have a metric that is rather robust. Additionally, it is also fairly inexpensive to compute, <sup>O</sup>(min{P, T }<sup>3</sup>).

#### **5 Experimental Assessment of the Metrics**

We now experimentally assess the efficacy of PF and iRank: since the relationship between p-flows and MDD nodes is stronger for S*sat* than for S*rch* (Theorem 4), we expect higher correlation when the MDD encodes the former. We also seek to determine whether these metrics can be used to drive iterative heuristics or metaheuristics that compute variable orders. All experiments are on different sets of orders for 40 models taken from the Petri Net Repository [2]. The experiments have been conducted using the GreatSPN tool [1,4], which uses the Meddly library [8]. All MDDs generated had fewer than one million nodes. We follow the evaluation procedure of [6] and compute the Spearman coefficient of correlation (CC), whose interpretation is: [1, 0.8] means very strong correlation, [0.6, 0.8] strong correlation, [0.4, 0.6] moderate correlation, and so on decreasing. Negative values indicate anti-correlation.

Figure 5 compares the correlation of iRank and PF to that of the metrics of Sect. 2.3. Although all experiments have been performed, for sake of space only 6 metrics are considered in the tables. We have chosen to include PSF, PF and iRank (for obvious reasons), plus the best among the **C** span metrics (SOUPS), and two versions of PTS (PTS and PTS*<sup>P</sup>* , without and with p-flow) since PTS is the metrics implicitly optimized by the widely used Force heuristic. No bandwidth metric is reported since they all exhibit at best a moderate correlation. Each row represents a metric, columns report the CC of the metrics with the MDD encoding S*sat* (columns [A] and [B]) and S*rch* (columns [C] and [D]) for two different sets of orders. The CC of a single model for a single metric is computed from the bivariate series relating, for each variable order λ, the MDD size built using λ with the value of the metric for that λ. ICC is the CC computed over the set of orders λ in VIMPR and BCC is computed over VBEST. The sets VIMPR and VBEST are built from 1,000 initial random orders by generating sequences of


**Fig. 5.** Two correlation coefficients for different metrics for *<sup>S</sup>sat* and *<sup>S</sup>rch*

increasingly better orders (in terms of MDD final size) until a convergence criterion is satisfied; VIMPR retains all orders while VBEST retains only the last orders in each sequence (thus exactly 1,000 orders). This construction is explained in [6], where it was observed that VBEST tends to contain mostly good orders, and VIMPR a mixture of good and bad orders. The above sets have been built for each of the 40 models. For each combination, we report the mean CC (over all models) and the CC distribution for the 40 models; the x axis is partitioned into 20 bins, so the y axis indicates the number of models whose CC falls into each bin. All plots have the same scale on the y axis. and the height of the bar at 0 is fixed at 36 for all rows.

The results of Fig. 5 indicate that iRank has the highest correlation for both ICC and BCC and for both S*rch* and S*sat*. iRank is better than the second best by 12% (ICC on S*sat*) and up to 28% (BCC on S*rch* ). The comparison with PTS (the metric used as a convergence criteria by the widely used Force heuristic) is even more striking. It is also evident that in none of the four cases PF performs better than PSF, supporting our observation that considering more p-flows is not always (or even usually) a good idea. Figure 5 also indicates that all metrics have better CC when the MDD encodes S*sat* (column [A] vs. [C], and column [B] vs. [D]). This is not surprising for iRank, given Theorem 4, but it also holds for all other metrics. This could be due to the fact that, since S*rch* ⊆ S*sat*, the MDD for S*rch* encodes additional constraints not captured by any of the metrics.

Comparing columns [A] and [B] (and columns [C] and [D]) of Fig. 5, we observe that ICC is higher than BCC for all metrics, meaning that they have better correlation when the set of considered orders is VIMPR (mix of good and bad orders) rather than VBEST (mostly good orders). This is related to the use of the Spearman CC, which quantifies how well the i-th largest value of the metric correlates with the i-th largest value of the MDD size: certainly with VBEST we tend to have more MDDs of similar size, making it more difficult to discriminate.

The experiments reported in Fig. 6 serve to evaluate whether the metrics can be used as an objective function inside a simulated annealing procedure (columns [A] and [C]) or as a meta-heuristic to select one among the orders produced during the simulated annealing (columns [B] and [D]). Given an initial variable order and a metric m the procedure searches an "optimal" order through


**Fig. 6.** Evaluation of metrics on simulated annealing produced orders


**Fig. 7.** Evaluation of metrics on Force-produced orders.

a simulated annealing procedure [17], aimed at minimizing the value of m. We employ a standard simulated annealing procedure, described in [6]. Unlike the construction of the set of orders used for the computation of ICC and BCC in Fig. 5, no MDD is built during the construction of the candidate variable order. For each metric, the simulated annealing procedure is run 1,000 times, from different initial orders, and Fig. 6 reports, in columns [A] and [C], the mean and distribution of the "score" of the MDDs built using the 1,000 orders produced by the 1,000 runs of the simulated annealing for each metric m, for the 40 models. The score is the distance from the size of the smallest MDD built, normalized on the distance between the smallest and the largest MDD size built (see [6], Eq. 5), obviously computed separately for each model. A value of 1 for order λ for a given model indicates that the smallest MDD seen for that model was built using λ. A value of 0 indicates the worst order. Column [A] refers to the MDDs storing S*sat*, while column [C] refers to S*rch* . Again, iRank performs better than any other metrics in both cases.

Columns [B] and [D] instead report the results of using each metric m as a meta-heuristic: for each model a single order is chosen (the order with the best value for metric m), and the 40 resulting scores are plotted. This corresponds to using metrics in practice to select a given order for a model. Again, iRank shows the best performance, indicating that it can select good candidate orders.

Figure 7 shows the evaluation of a meta-heuristic also defined in [6], based on Force. Each metric m is used to drive the selection of the "best" variable order among a set of variable orders produced using Force from an initial set of 1,000 random orders. This is done for each of the 40 models. The last row is the baseline (40×1,000 points, all computed using Force), while all other histograms are built out of 40 MDD sizes, one per model. A mean value greater than the baseline mean indicates that the metric selects the best orders among the ones computed by Force. A mean below the baseline indicates otherwise. Again, when we employ iRank to select the order to use, we get a better score than with any other metric for both S*sat* (left column) and S*rch* (right column).

### **6 Conclusion and Future Work**

We considered the problem of defining and evaluating variable orders for MDDs encoding either the reachable states of a DEDS (S*rch* ) or the states satisfying a set of linear invariants (S*sat*). We studied the relation between the MDD size and structure and the linear invariants, and proposed two new metrics: PF, a trivial extension to PSF; and iRank. Through a set of experiments, metrics have been evaluated both as predictors of the MDD size and as drivers for two heuristics (and associated meta-heuristics). The experiments follow the procedure proposed in [6], as defining a good and fair procedure to compare metrics and MDD sizes for a set of models is a nontrivial task. The results show that iRank is better than any other metrics we found in the literature.

The definition of iRank, and PF, assumes that linear invariants are available. For DEDSs specified as Petri nets, the linear invariants are derived from the pflows, the left annullers of the incidence matrix, an integer matrix describing how an event modifies a state. Clearly, whenever a DEDS can be specified through a similar matrix, the application of our method is straightforward, as in the case of various formalisms used in system modeling and verification. For other formalisms, this may be less immediate, but our method only assumes a set of linear invariants on the state space, regardless of how they are computed.

In our experiments, we considered only *conservative* Petri nets, where each place appears in at least one invariant. This allowed us to compare with previously defined metrics that exploit linear invariants generated from p-semiflows. If no invariants are available, or if most places are not part of any invariant, PF and iRank could perform very poorly. If a net is not conservative, a subset of places may "lose" tokens, "gain tokens", or both. The last two cases cause S*rch* to be infinite, but the first case can still be managed by our approach, thanks to p-flows. As an example, consider the net obtained from the net in Fig. 1(B) by removing the arc from transition T<sup>3</sup> back to P<sup>0</sup> Such a net does not have any p-semiflow, but all the places between each pair of fork-and-join belong to a p-flow, allowing us to apply our method. A further extension could consider invariants where the weighted sum of tokens in a subset of places is less than or equal a constant (instead of just equal).

Several directions for additional exploration remain. First, iRank does not consider the initial state of the DEDS, but the number of nodes at a given level depends on the token count of the p-flows, and this may be especially important when the p-flows have significantly different token counts. Then, the efficient computation of iRank is obviously important, as heuristics using it could probably evaluate it many times. The computation could be expensive since it involves matrix rank computations.

Finally, we are interested in extending iRank to more general constraints, which can still provide hints on good variable orders; for example, a constraint "if A = 3 then B = C" imposes no limitations on C along paths where A = 3, (assuming A is above B and B is above C in the MDD), but, requires to remember the value of B until reaching C along paths where A = 3.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Binary Decision Diagrams with Edge-Specified Reductions**

Junaid Babar(B) , Chuan Jiang, Gianfranco Ciardo, and Andrew Miner

Department of Computer Science, Iowa State University, Ames, IA 50011, USA {junaid,cjiang,ciardo,asminer}@iastate.edu

**Abstract.** Various versions of binary decision diagrams (BDDs) have been proposed in the past, differing in the reduction rule needed to give meaning to edges skipping levels. The most widely adopted, fully-reduced BDDs and zero-suppressed BDDs, excel at encoding different types of boolean functions (if the function contains subfunctions independent of one or more underlying variables, or it tends to have value zero when one of its arguments is nonzero, respectively). Recently, new classes of BDDs have been proposed that, at the cost of some additional complexity and larger memory requirements per node, exploit both cases. We introduce a new type of BDD that we believe is conceptually simpler, has small memory requirements in terms of node size, tends to result in fewer nodes, and can easily be further extended with additional reduction rules. We present a formal definition, prove canonicity, and provide experimental results to support our efficiency claims.

### **1 Introduction**

Decision diagrams (DDs) have been widely adopted for a variety of applications. This is due to their often compact, graph-based representations of functions over boolean variables, along with operations to manipulate those boolean functions based on the sizes of the graph representations, rather than the size of the domain of the function. Most DD types are canonical for boolean functions: for a fixed ordering of the function variables, each function has a unique (modulo graph isomorphism) DD representation, or encoding.

Compactness, and canonicity, is achieved through careful rules for eliminating nodes. All canonical DDs eliminate nodes that duplicate information: if nodes p and q encode the same function, one of them is discarded. Additional compactness comes from a reduction rule (or rules) that specifies both how to interpret "long" edges that skip over function variables, and how to eliminate nodes and replace them with long edges. Two popular forms of decision diagrams, Binary Decision Diagrams (BDDs) [1] and Zero-suppressed binary Decision Diagrams (ZDDs) [8], use different reduction rules. Some applications are more suitable for BDDs while others are more suitable for ZDDs, depending on which of the two reductions can be applied to a greater number of nodes. Unfortunately, it is not always easy to know, *a priori*, which reduction rule is best for a particular application. Worse, there are applications where *both* rules are useful.

Recently, Tagged BDDs (TBDDs) [10] and Chain-reduced BDDs (CBDDs) or ZDDs (CZDDs) [2] have been introduced to combine the reduction rules of BDDs and ZDDs. We introduce a new type of BDD, called *Edge Specified Reduction* BDDs (ESRBDDs), that we believe is conceptually simpler and has smaller node storage requirements than TBDDs, CBDDs, and CZDDs, while still exploiting the BDD and ZDD reduction rules. Additionally, ESRBDDs are flexible in that additional reduction rules may be added with low cost. Finally, unlike TBDDs, CBDDs, and CZDDs, ESRBDDs treat the BDD and ZDD reduction rules equally: there is no need to prioritize one rule over another.

The paper is organized as follows. Section 2 recalls definitions for BDDs and ZDDs and describes related work. Section 3 formally defines ESRBDDs, gives their reduction algorithm, proves that they are a canonical form, and compares them with related DDs. Section 4 gives detailed experimental results to show how the various DDs compare in practice. Section 5 provides conclusions.

### **2 Related Decision Diagrams**

We focus on various types of DDs that have been proposed to efficiently encode boolean functions of boolean variables, and briefly recall DDs relevant to our work. For consistency in notation, all DD types we present encode functions of the form <sup>f</sup> : <sup>B</sup><sup>L</sup> <sup>→</sup> <sup>B</sup> and have <sup>L</sup> levels, with level <sup>L</sup> at the top.

The first and most widely-known type is the *reduced-ordered binary decision diagrams* (BDDs) [1]. A BDD is a directed acyclic graph where the two terminal nodes **0** and **1** are at level 0, we write *lvl*(**0**) = *lvl*(**1**) = 0, while each nonterminal node <sup>p</sup> belongs to a level *lvl*(p) ∈ {1, ..., L} and has two outgoing edges, <sup>p</sup>[0] and p[1], pointing to nodes at lower levels (this is the "ordered" property). The "reduced" property instead forbids both *duplicate* nodes (p and q are duplicates if *lvl*(p) = *lvl*(q), p[0] = q[0], and p[1] = q[1]), and *redundant* nodes (p is redundant if p[0] = p[1]). The function F<sup>p</sup> encoded by BDD node p is defined as

$$F\_p(x\_{1:L}) = \begin{cases} F\_{p[x\_{lvl(p)}]}(x\_{1:L}) & lvl(p) > 0\\ p & lvl(p) = 0, \end{cases}$$

where (x1:<sup>L</sup>) is a shorthand for the boolean tuple (x1, ..., xL).

Another widely-used type is the *zero-suppressed binary decision diagrams* (ZDDs) [8], which differ from BDDs only in that they forbid *high-zero* nodes (node p is high-zero if p[1] = **0**) instead of redundant nodes. The function encoded by ZDD node <sup>p</sup> is defined with respect to a level <sup>n</sup> <sup>≥</sup> <sup>m</sup> <sup>=</sup> *lvl*(p), as

$$F\_p^n(x\_{1:n}) = \begin{cases} 0 & n > m \land \exists i, m < i \le n, x\_i = 1\\ F\_p^m(x\_{1:m}) & n > m \land \forall i, m < i \le n, x\_i = 0\\ F\_{p[x\_m]}^{m-1}(x\_{1:m-1}) & n = m > 0\\ p & n = m = 0. \end{cases}$$

Both BDDs and ZDDs are *canonical*: any function <sup>f</sup> : <sup>B</sup><sup>L</sup> <sup>→</sup> <sup>B</sup> has a unique node p encoding it, an essential property guaranteeing *time* efficiency. Just as important is their *memory* efficiency, i.e., the number of nodes required to encode a given function. In this respect, BDDs and ZDDs are particularly suited to different situations. BDDs require fewer nodes if there are many "don't cares", i.e., it often happens that Fp(x1:L) = Fp(y1:L) when x1:<sup>L</sup> and y1:<sup>L</sup> differ in one position, as this corresponds to redundant nodes, not stored in BDDs. ZDDs require fewer nodes if the function tends to have value 0 when many arguments have value 1 as this corresponds to high-zero nodes, not stored in ZDDs.

*Quasi-reduced BDDs* (QBDDs) [5] are also canonical: they are just like BDDs (or ZDDs) except they only forbid duplicate nodes. QBDD edges connect nodes on adjacent levels. Since edges are not allowed to *skip* levels, nodes do not need to store level information, and redundant and high-zero nodes cannot be eliminated. A useful variation is to eliminate only redundant (or high-zero) nodes whose children are **0**, and thus allow long edges directly to **0**. In either case, QBDDs require at least as many nodes as BDDs and ZDDs to encode a given function, so they provide an upper bound on both the BDD and the ZDD sizes.

Various decision diagrams have been proposed to combine the characteristics of BDDs and ZDDs and exploit the reduction potential of both. *Tagged binary decision diagrams* (TBDDs) [10] associate a level tag to each edge. BDD reductions are implied along the edge from the level of the node to the level of the tag, and ZDD reductions are implied from the level of the tag to the level of the node pointed to by the edge. Alternatively, TBDDs can apply reductions in the reverse order along an edge: ZDD reductions first and BDD reductions second. Either reduction order can be used in TBDDs, but a TBDD can only use one of them, i.e., they cannot both be used in the same TBDD.

*Chain-reduced BDDs* (CBDDs) and *chain-reduced ZDDs* (CZDDs) [2] augment BDDs and ZDDs by using nodes to encode chains of high-zero nodes and redundant nodes, respectively. Each node specifies two levels, the first level indicating where the chain starts (similar to the level of an ordinary BDD or ZDD node), and the second, additional, level indicating where the chain ends.

Finally, *ordered Kronecker functional decision diagrams* [3] allow multiple decomposition types (Shannon, positive Davio, and negative Davio), enabling both BDD and ZDD reductions. However, each level has a fixed decomposition type, thus this approach is less flexible, potentially less efficient, and hindered by the need to know which decomposition will perform best for each level.

### **3 ESRBDDs**

**Definition 1.** An L-level *(ordered) edge-specified reduction* binary decision diagram (ESRBDD) is a directed acyclic graph where the two *terminal* nodes **0** and **1** are at level 0, *lvl*(**0**) = *lvl*(**1**) = 0, while each *nonterminal* node p belongs to a level *lvl*(p) ∈ {1, ..., L} and has two outgoing edges, <sup>p</sup>[0] and <sup>p</sup>[1], pointing to nodes at lower levels. An edge is a pair <sup>e</sup> <sup>=</sup> e.rule,e.node , where e.rule is a *reduction rule* in {S, <sup>L</sup>0, <sup>H</sup>0, <sup>X</sup>} and e.node is the node to which edge <sup>e</sup> points. For <sup>i</sup> ∈ {0, <sup>1</sup>}, if *lvl*(p[i].node) = *lvl*(p) <sup>−</sup> 1, we say that <sup>p</sup>[i] is a *short* edge and require that <sup>p</sup>[i].rule <sup>=</sup> <sup>S</sup>. If instead *lvl*(p[i].node) <sup>&</sup>lt; *lvl*(p) <sup>−</sup> 1, the only other possibility, we say that p[i] is a *long* edge, since it "skips over" one or more levels, and require that <sup>p</sup>[i].rule ∈ {H0, <sup>L</sup>0, <sup>X</sup>}.

The reduction rule on an edge specifies its meaning when skipping levels, thus it is just S for short edges while, for long edges, the rules H0, L0, and X correspond to the "zero-suppressed" rule of [8], the "one-suppressed" rule (a new rule analogous to the zero-suppressed, as we shall see), and the "fully-reduced" rule of [1], respectively. To make this more precise, we recursively define the boolean function F <sup>n</sup> κ,p : <sup>B</sup><sup>n</sup> <sup>→</sup> <sup>B</sup> encoded by an ESRBDD edge κ,p with respect to a level <sup>n</sup> ∈ {0, ..., L}, subject to *lvl*(p) <sup>≤</sup> <sup>n</sup>, as

$$F\_{\langle \kappa, p \rangle}^{n}(x\_{1:n}) = \begin{cases} \text{if } lvl(p) = n = 0 & p \\ \text{if } lvl(p) = n > 0 & (x\_n) \text{ ? } F\_{p[1]}^{n-1}(x\_{1:n-1}) \text{ } : F\_{p[0]}^{n-1}(x\_{1:n-1}) \\ \text{if } lvl(p) < n, \kappa = \mathbf{x}, \quad (x\_n) \text{ ? } F\_{\langle \kappa, p \rangle}^{n-1}(x\_{1:n-1}) \text{ } : F\_{\langle \kappa, p \rangle}^{n-1}(x\_{1:n-1}) \\ \text{if } lvl(p) < n, \kappa = \mathbf{H}\_0, \ (x\_n) \text{ ? } \mathbf{0} & : F\_{\langle \kappa, p \rangle}^{n-1}(x\_{1:n-1}) \\ \text{if } lvl(p) < n, \kappa = \mathbf{L}\_0, \ (x\_n) \text{ ? } F\_{\langle \kappa, p \rangle}^{n-1}(x\_{1:n-1}) \text{ } : \mathbf{0}, \end{cases}$$

where the if-then-else operator (xn)?f1:f<sup>0</sup> is a shorthand for (¬x<sup>n</sup>∧f0)∨(x<sup>n</sup>∧f1).

We defined an ESRBDD as a directed acyclic graph, so it can potentially have multiple *roots* (nodes with no incoming edges). However, since our focus is on the size of the DD encoding a given function, we assume from now on that our ESRBDDs have a single root node p, pointed to by a *dangling edge* with rule κ. We denote the set of all nodes reachable from p (and therefore all nodes in the ESRBDD) as *Nodes*(p). The dangling edge <sup>κ</sup>,p encodes the function F <sup>L</sup> κ-,p-, which is independent of <sup>κ</sup> only if *lvl*(p) = <sup>L</sup>, in which case we require <sup>κ</sup> <sup>=</sup> <sup>S</sup>, while we require <sup>κ</sup> ∈ {L0, <sup>H</sup>0, <sup>X</sup>} if *lvl*(p) < L. Finally, we will informally say "ESRBDD <sup>κ</sup>,p " to refer to the entire graph below (and including) dangling edge <sup>κ</sup>,p .

Before introducing reduced ESRBDDs and showing they are canonical, we need some terminology. We say that an ESRBDD nonterminal node q:


Note that BDDs [1] can be viewed as ESRBDDs where the edge labels are restricted to {S, <sup>X</sup>}, and a reduced BDD corresponds to an ESRBDD with no duplicate nodes and no redundant nodes. Similarly, ZDDs [8] can be viewed as ESRBDDs where edge labels are restricted to {S, <sup>H</sup><sup>0</sup>}, and a reduced ZDD corresponds to an ESRBDD with no duplicate nodes and no high-zero nodes. Also, we note that there is no corresponding definition in the existing literature for the version of ESRBDDs where the edge labels are restricted to {S, <sup>L</sup><sup>0</sup>}.

**Fig. 2.** Replacement rules for patterns in Fig. 1

**Definition 2.** An ESRBDD is *reduced* if the following restrictions hold:


The last restriction disallows edges <sup>H</sup>0,**<sup>0</sup>** and <sup>L</sup>0,**<sup>0</sup>** in the reduced ESRBDD. This is because F <sup>n</sup> H0,**0** <sup>≡</sup> <sup>F</sup> <sup>n</sup> L0,**0** <sup>≡</sup> <sup>F</sup> <sup>n</sup> X,**0** <sup>≡</sup> **<sup>0</sup>**, and since we want to enforce canonicity in the reduced ESRBDD, we have *arbitrarily* chosen <sup>X</sup>,**<sup>0</sup>** as the unique representation for such long edges.

#### **3.1 Reducing an ESRBDD**

An ESRBDD can be converted into a reduced ESRBDD using Algorithm 1. The algorithm first replaces any edges <sup>H</sup>0,**<sup>0</sup>** or <sup>L</sup>0,**<sup>0</sup>** with <sup>X</sup>,**<sup>0</sup>** , to satisfy restriction R5. Then, it repeatedly chooses a high-zero, low-zero, redundant, or duplicate node q and eliminates it. If node q duplicates node p, then it redirects all incoming edges from q to p (line 7). Otherwise, q is a high-zero, low-zero, or redundant node, and lines 9–14 find a node d with *lvl*(d ) <sup>&</sup>lt; *lvl*(q) = <sup>n</sup> <sup>−</sup> 1, and a rule <sup>κ</sup> ∈ {X, <sup>H</sup>0, <sup>L</sup><sup>0</sup>} such that <sup>F</sup> <sup>n</sup> S,q(x1:<sup>n</sup>) = <sup>F</sup> <sup>n</sup> κ-,d-(x1:<sup>n</sup>). Note that a short edge to node q becomes a long edge to node d because *lvl*(d ) < *lvl*(q). For the special case of <sup>d</sup> <sup>=</sup> **<sup>0</sup>**, *any* edge to <sup>q</sup> is equivalent to edge <sup>X</sup>,**<sup>0</sup>** , so the algorithm replaces those edges (line 16).

When <sup>d</sup> <sup>=</sup> **<sup>0</sup>**, we have <sup>F</sup> <sup>n</sup> S,q(x1:<sup>n</sup>) = <sup>F</sup> <sup>n</sup> κ-,d-(x1:<sup>n</sup>) for <sup>n</sup> <sup>=</sup> *lvl*(q) + 1, and these edges are replaced in line 18. It follows that F <sup>n</sup> κ-,q(x1:<sup>n</sup>) = <sup>F</sup> <sup>n</sup> κ-,d-(x1:<sup>n</sup>) for n > *lvl*(q)+1; these replacements are made in line 19. For rules <sup>κ</sup> ∈ {X, <sup>H</sup>0, <sup>L</sup><sup>0</sup>}


with <sup>κ</sup> <sup>=</sup> <sup>κ</sup> , we cannot replace κ,q with a single long edge to node d , because the edge needs different reduction rules: the κ rule is needed above level *lvl*(q), and the κ rule is needed from level *lvl*(q) down. So lines 21–27 of the algorithm create a new node q at level *lvl*(q) + 1, of the appropriate shape such that F <sup>n</sup> κ,q(x1:<sup>n</sup>) = <sup>F</sup> <sup>n</sup> S,q-(x1:<sup>n</sup>) for <sup>n</sup> <sup>=</sup> *lvl*(q ) + 1. It then follows that F <sup>n</sup> κ,q(x1:<sup>n</sup>) = F <sup>n</sup> κ,q-(x1:<sup>n</sup>) for n > *lvl*(q ) + 1. These replacements are made in line 28, where the replacement κ,q is used for long edges, and <sup>S</sup>,q is used for short edges.

In the above discussion, any edge that is replaced by the algorithm encodes the same function as its replacement, giving us the following lemma.

**Lemma 1.** In Algorithm 1, each edge replacement preserves the function encoded by the ESRBDD <sup>κ</sup>,p .

It remains to show that the algorithm always terminates.

**Lemma 2.** Algorithm <sup>1</sup> terminates in <sup>O</sup>(|*Nodes*(p)|) steps.

**Proof:** The proof is based on the observation that, at every iteration of the algorithm, a node q is chosen to be processed (line 5), at most two nodes are created at level *lvl*(q) + 1 (line 21), and node q is removed (line 29). These new nodes (q on line 21), by construction, satisfy one of the following patterns:

– q [0] = q [1] = <sup>κ</sup> ,d , where <sup>d</sup> <sup>=</sup> **<sup>0</sup>**, and <sup>κ</sup> ∈ {H0, <sup>L</sup>0}, – q [0] = <sup>X</sup>,**<sup>0</sup>** , and q [1] = <sup>κ</sup> ,d , where <sup>d</sup> <sup>=</sup> **<sup>0</sup>**, and <sup>κ</sup> ∈ {X, <sup>H</sup>0}, – q [0] = <sup>κ</sup> ,d , and q [1] = <sup>X</sup>,**<sup>0</sup>** , where <sup>d</sup> <sup>=</sup> **<sup>0</sup>**, and <sup>κ</sup> ∈ {X, <sup>L</sup>0}.

These nodes are neither redundant, high-zero, nor low-zero, but they could be duplicates. Since the elimination of duplicate nodes (line 7) does not create new nodes, the two nodes created at *lvl*(q) + 1 result in at most two additional iterations of the algorithm. Therefore, for every node in the original ESRBDD, the algorithm iterates at most three times.

**Theorem 1.** Algorithm <sup>1</sup> converts ESRBDD <sup>κ</sup>,p to an equivalent reduced ESRBDD in <sup>O</sup>(|*Nodes*(p)|) steps.

**Proof:** Lemma <sup>2</sup> establishes that Algorithm <sup>1</sup> terminates in <sup>O</sup>(|*Nodes*(p)|) steps. Based on the condition of the while loop, when the loop terminates, we know that the ESRBDD contains no high-zero, low-zero, redundant, or duplicate nodes. From line 3 and the fact that the algorithm never adds an edge of the form <sup>H</sup>0,**<sup>0</sup>** or <sup>L</sup>0,**<sup>0</sup>** , we conclude that when Algorithm 1 terminates, any edge to terminal node **0** must have edge rule S or X. Therefore, when the Algorithm terminates, the ESRBDD is reduced. Lemma 1 establishes that Algorithm 1 produces an equivalent (in terms of encoded function) ESRBDD.

While we have established that Algorithm 1 always terminates and produces a reduced ESRBDD, we have not yet established that the Algorithm produces the *same* reduced ESRBDD, regardless of the order in which nodes are chosen in line 5. This is guaranteed by the canonicity property, discussed next. Additionally, we note here that, unlike most other decision diagrams (including BDDs, ZDDs, CBDDs, CZDDs, and TDDs), a reduced ESRBDD is not necessarily a minimum size ESRBDD encoding of a function, even for a fixed variable order, as elimination of some node q during the reduction could trigger the creation of two new nodes. An example of this is shown in Fig. 3, where redundant node q is eliminated. Edges <sup>S</sup>,q and <sup>X</sup>,q can be simply redirected as <sup>X</sup>,p , but the <sup>H</sup>0,q and <sup>L</sup>0,q edges require the creation of two new nodes qH0 and qL0 .

While the "chaotic" non-deterministic reduction procedure in Algorithm 1 is handy in proving termination under the most general conditions, in practice we utilize a deterministic depth-first version of this algorithm that reduces a node only after having reduced its children.

#### **3.2 Canonicity of Reduced ESRBDDs**

We are now ready to discuss the *canonicity* of reduced ESRBDDs, i.e., to show that a function has a unique encoding as a reduced ESRBDD. In the following, we say that functions F <sup>n</sup> κ,p and <sup>F</sup> <sup>n</sup> κ-,p are *equivalent*, written <sup>F</sup> <sup>n</sup> κ,p <sup>≡</sup> <sup>F</sup> <sup>n</sup> κ-,p-, if F <sup>n</sup> κ,p(x1:<sup>n</sup>) = <sup>F</sup> <sup>n</sup> κ-,p-(x1:<sup>n</sup>) for all possible inputs (x1:<sup>n</sup>) <sup>∈</sup> <sup>B</sup><sup>n</sup>.

**Fig. 3.** A worst-case example where elimination of node q creates two nodes.

**Theorem 2.** In a reduced ESRBDD, for any <sup>n</sup> <sup>∈</sup> <sup>N</sup>, for any two edges <sup>e</sup> <sup>=</sup> κ,p , <sup>e</sup> <sup>=</sup> <sup>κ</sup> ,p with *lvl*(p) <sup>≤</sup> <sup>n</sup>, *lvl*(p ) <sup>≤</sup> <sup>n</sup>, if <sup>F</sup> <sup>n</sup> <sup>e</sup> <sup>≡</sup> <sup>F</sup> <sup>n</sup> e then (1) p = p , and (2) if *lvl*(p) < n then κ = κ .

**Proof:** The proof is by induction on n. For the base case, we use n = 0 and from the definition of F we have F<sup>0</sup> <sup>e</sup> <sup>≡</sup> <sup>F</sup><sup>0</sup> e- <sup>→</sup> <sup>p</sup> <sup>=</sup> <sup>p</sup> .

Now, suppose the theorem holds for <sup>n</sup> <sup>=</sup> <sup>m</sup>, where <sup>m</sup> <sup>≥</sup> 0, we will prove it holds for <sup>n</sup> <sup>=</sup> <sup>m</sup> + 1. Regardless of κ,p , we have

$$F^{n}\_{\langle \kappa, p \rangle}(x\_{1:n}) = (x\_n) ! f\_1(x\_{1:n-1}) ! f\_0(x\_{1:n-1})$$

for some functions f<sup>0</sup> and f1. Similarly, we have

$$F^{n}\_{\langle \kappa', p' \rangle}(x\_{1:n}) = (x\_n) ? f\_1'(x\_{1:n-1}) ! f\_0'(x\_{1:n-1}) . .$$

It follows that F <sup>n</sup> κ,p <sup>≡</sup> <sup>F</sup> <sup>n</sup> κ-,p if and only if <sup>f</sup><sup>0</sup> <sup>≡</sup> <sup>f</sup> <sup>0</sup> and <sup>f</sup><sup>1</sup> <sup>≡</sup> <sup>f</sup> 1.

First, suppose *lvl*(p) = n and *lvl*(p ) = n. From the definition of F, it follows that F <sup>n</sup>−<sup>1</sup> <sup>p</sup>[0] <sup>≡</sup> <sup>F</sup> <sup>n</sup>−<sup>1</sup> p-[0] and <sup>F</sup> <sup>n</sup>−<sup>1</sup> <sup>p</sup>[1] <sup>≡</sup> <sup>F</sup> <sup>n</sup>−<sup>1</sup> p-[1] . By inductive hypothesis, <sup>p</sup>[0].node <sup>=</sup> p [0].node and p[1].node = p [1].node. If *lvl*(p[0].node) < n−1, then by inductive hypothesis, p[0] = p [0]; otherwise, *lvl*(p[0].node) = <sup>n</sup> <sup>−</sup> 1 and we must have p[0].rule = S and p [0].rule = S, thus p[0] = p [0]. By a similar argument, it follows that p[1] = p [1]. We therefore have either that p = p and the theorem holds, or that p duplicates p , which is impossible because of restriction R1. Next, suppose *lvl*(p) < n and *lvl*(p ) < n. If κ = κ , then in all cases for F we conclude that F <sup>n</sup>−<sup>1</sup>

κ,p <sup>≡</sup> <sup>F</sup> <sup>n</sup>−<sup>1</sup> κ-,p and by inductive hypothesis we have that <sup>p</sup> <sup>=</sup> <sup>p</sup> , so the theorem holds. We now show that <sup>κ</sup> <sup>=</sup> <sup>κ</sup> is impossible, by contradiction. Consider the possible cases for <sup>κ</sup> <sup>=</sup> <sup>κ</sup> :


In all cases, we conclude that F <sup>n</sup>−<sup>1</sup> κ,p <sup>≡</sup> **<sup>0</sup>** and <sup>F</sup> <sup>n</sup>−<sup>1</sup> κ-,p- <sup>≡</sup> **<sup>0</sup>**. By the inductive hypothesis, we have that p = **0** and p = **0**. According to R5, if p = **0** then κ cannot be L<sup>0</sup> or H0. But this implies κ = X and κ = X, contradicting our assumption that <sup>κ</sup> <sup>=</sup> <sup>κ</sup> .

Finally, suppose *lvl*(p) = n and *lvl*(p ) < n (the case *lvl*(p) < n and *lvl*(p ) = n is symmetric). We show that this is impossible, by contradiction. Consider the possible cases for κ :


The canonicity result establishes that, regardless of how a ESRBDD is constructed for a given function, the resulting reduced ESRBDD is guaranteed to be unique (assuming a given variable order). Thus, we can determine in constant time whether two functions encoded as reduced ESRBDDs are equivalent (as is already the case for reduced ordered BDDs and ZDDs). From now on, unless otherwise specified, we assume that all ESRBDDs are reduced.

#### **3.3 Comparing ESRBDDs to Other Types of Decision Diagrams**

For the remainder of the paper, we consider the relative size of the different types of DD based on the interpretation of long edges, namely, BDDs, ZDDs, CBDDs, CZDDs, TBDDs, and ESRBDDs. We also consider ESRBDDs without the L<sup>0</sup> edge label, denoted ESRBDD−L0. These are summarized in Table 1, some entries (comparisons between BDDs, ZDDs, CBDDs, and CZDDs) are known from prior work [2,6], some entries are discussed below, and some entries are unknown. Entry [T1, T2] describes the worst-case increase in the number of nodes, as a multiplicative factor, More formally, it is the bound for "number of nodes required to encode f using T2" divided by "number of nodes required to encode f using T1" for all functions f over L boolean variables. Note that the node counts always include both terminal nodes. A factor of 1 indicates that type T<sup>1</sup> cannot require fewer nodes than type T2.

First, we discuss how an arbitrary BDD can be converted into a TBDD or ESRBDD, and fill in the BDD row in Table 1. To build a TBDD from a BDD, every edge to a non-terminal node p in the BDD is annotated with the level tag *lvl*(p). By definition, any such annotated edge in a TBDD implies BDD


**Table 1.** Worst-case relative increase when converting one DD type into another.

reductions for the skipped levels. A TBDD thus constructed is no larger than the BDD, and may be further reduced (since it could contain high-zero nodes) by applying the TBDD reduction described in [10]. Similarly, we can annotate long edges in the BDD with X (Fig. 4(a)), and short edges with S, to obtain an unreduced ESRBDD. We then apply Algorithm 1. We now show that this will not increase the ESRBDD size, and thus the resulting ESRBDD cannot be larger than the original BDD.

**Lemma 3.** Suppose we have an unreduced ESRBDD where, for every node q, there exists a rule <sup>κ</sup> ∈ {X, <sup>H</sup>0, <sup>L</sup><sup>0</sup>} such that every edge to <sup>q</sup> is either <sup>S</sup>,q or κ,q . Then reducing the ESRBDD will not increase the number of nodes.

**Proof:** Apply Algorithm 1 and in line 5, always choose a node at the lowest level. Then, when a node q is chosen, all incoming edges to q will be labeled either with <sup>S</sup> or with <sup>κ</sup>. The <sup>S</sup>,q edges will not cause any node to be created. The κ,q edges will cause at most one node to be created. But then node q is removed. Thus, the overall number of nodes cannot increase.

It is also easy to convert a ZDD into a TBDD or ESRBDD. To obtain a TBDD, annotate every edge from non-terminal node p with the level tag *lvl*(p), so that ZDD reductions are used for all the edges; then reduce the TBDD. To obtain an ESRBDD, annotate long edges in the ZDD with H0, see Fig. 4(b), and short edges with S, and apply Algorithm 1.

The conversion from a chained DD to an unreduced ESRBDD is illustrated in Fig. 4(c) and (d). For each chain node x<sup>k</sup> : x<sup>i</sup> with x<sup>k</sup> > xi, create a "top node" with variable xk, and a "bottom node" with variable xi, that is only pointed to by its corresponding top node. In a CBDD, the top node will be a high-zero node, and all top nodes and non-chained nodes will have incoming edges labeled with X or S. In a CZDD, the top node will be a redundant node, and all top nodes and non-chained nodes will have incoming edges labeled with H<sup>0</sup> or S. At worst, the unreduced ESRBDD has twice the nodes of the original CBDD or CZDD and, from Lemma 3, reducing this ESRBDD does not increase its size.

In a TBDD, each edge can be characterized as short, purely X, purely H0, or partly X and partly H0. To convert into an ESRBDD, the short edges are labeled with S, the purely X edges are labeled with X, the purely H<sup>0</sup> edges are labeled

**Fig. 4.** Converting to ESRBDDs.

with H0. Edges that are partly X and partly H<sup>0</sup> require the addition of a node at the level where the reduction rule changes, as shown in Fig. 4(e). The worst case occurs when *every* edge requires such a node. Then, since every TBDD node has two outgoing edges, the resulting unreduced ESRBDD will have triple the number of nodes. Since all of the introduced nodes have incoming X edges, and all other nodes have incoming S or H<sup>0</sup> edges, from Lemma 3 this ESRBDD will not increase in size when it is reduced. We note here that, if there are some purely X edges in the TBDD, then Lemma 3 no longer applies; however, the number of nodes that would be added during reduction is no more than the number of nodes saved by not having to introduce a node on the purely X edges.

We now consider converting from ESRBDDs into the other DD types. In the case where L<sup>0</sup> edges are not allowed (row ESRBDD−L<sup>0</sup> in Table 1), the worst case BDD is from ESRBDD <sup>H</sup>0,**<sup>1</sup>** and the worst case ZDD is from ESRBDD <sup>X</sup>,**<sup>1</sup>** . In both cases, the ESRBDD has 2 nodes, while the resulting BDD/ZDD has L + 2 nodes, giving ratios of L/2 + o(L), similar to the discussion in [6, p. 250]. The example ZDD in [2], which produces a CBDD with three times as many nodes, can be converted into an ESRBDD of the same size. Similarly, the example BDD in [2], which produces a CZDD with twice as many nodes, can be converted into an ESRBDD of the same size. Any ESRBDD without L<sup>0</sup> edges can be converted into a TBDD by labeling X edges with a level tag such that the X rule is always applied, and labelling H<sup>0</sup> edges with a level tag such that the H<sup>0</sup> rule is always applied. Therefore, the TBDD cannot be larger than the ESRBDD. An ESRBDD−L<sup>0</sup> can be converted into an ESRBDD by running Algorithm 1 to eliminate any low-zero nodes. For each low-zero node that is eliminated, we could have an incoming X and H<sup>0</sup> edge, causing the creation of two nodes. Suppose we eliminate n low-zero nodes that cause creation of two nodes. Then, because each low-zero node must have 2 incoming edges, we must have 2<sup>n</sup> incoming edges to these nodes. Above, we must have at least 2<sup>n</sup> <sup>−</sup> <sup>1</sup> nodes to produce these edges. We could then "stack" such a pattern m times. This gives an ESRBDD with <sup>m</sup>(<sup>n</sup> + 2<sup>n</sup> <sup>−</sup> 1) + 2 = <sup>m</sup>(3<sup>n</sup> <sup>−</sup> 1) + 2 nodes, and a reduced ESRBDD with <sup>m</sup>(2<sup>n</sup> + 2<sup>n</sup> <sup>−</sup> 1) + 2 = <sup>m</sup>(4<sup>n</sup> <sup>−</sup> 1) + 2 nodes. The upper bound of this ratio is 3/2, which occurs when n = 1 and m goes to infinity.

For the case of ESRBDDs with all types of edges (row ESRBDD in Table 1), the <sup>L</sup><sup>0</sup> edge allows us to build different worst cases. Consider an ESRBDD <sup>S</sup>,p where *lvl*(p) = <sup>L</sup>, <sup>p</sup>[0] = <sup>H</sup>0,**<sup>1</sup>** , and <sup>p</sup>[1] = <sup>L</sup>0,**<sup>1</sup>** . This ESRBDD has 3 nodes.


**Table 2.** Numbers of nodes for dictionary benchmarks.

Because BDDs cannot exploit H<sup>0</sup> or L<sup>0</sup> edges, this will produce a BDD with 2(<sup>L</sup> <sup>−</sup> 1) + 3 = 2<sup>L</sup> + 1 nodes, giving a worst-case ratio of 2L/3. The ZDD worstcase is similar, using instead <sup>p</sup>[0] = <sup>X</sup>,**<sup>1</sup>** . Finally, for DD types that can exploit both <sup>X</sup> and <sup>H</sup><sup>0</sup> edges, the ESRBDD <sup>L</sup>0,**<sup>1</sup>** corresponds to the worst case: the CBDD, CZDD, TBDD, and ESRBDD−L<sup>0</sup> will all require <sup>L</sup> + 2 nodes.

### **4 Experimental Results**

We compare the performance of QBDDs (with long edges to **0**), BDDs, ZDDs, CBDDs, CZDDs, TBDDs, and ESRBDDs on three sets of benchmarks. The first two benchmarks are similar to those used in [2], and are representative of general textual information and digital logic functions, respectively. The third benchmark is typical in state space analysis of concurrent systems.

#### **4.1 Dictionaries**

A dictionary can be encoded as an indicator function over the set of strings of a given length from either the compact alphabet consisting of the distinct symbols found in the dictionary plus NULL, or the full alphabet of all 128 ASCII characters (to ensure that all encoded strings have the same length, shorter ones are padded with the ASCII symbol NULL). We use the encoding schemes described in [2]: *one-hot* and *binary*. Therefore, each dictionary generates four benchmarks, one for each choice of encoding and alphabet.

We compare the different DD types on two dictionaries. The first one is the English words in file /usr/share/dict/words under MacOS, containing 235,886 words with lengths ranging from 1 to 24. Its compact alphabet contains lower and upper case letters plus hyphen and NULL (54 in total). The second one is a set of passwords from SecLists [7] (non-ASCII characters are replaced with NULL), containing 999,999 passwords with lengths ranging from 1 to 39. Its compact alphabet consists of 91 symbols including NULL.


**Table 3.** Numbers of nodes for combinational circuit benchmarks.

Table 2 reports the number of nodes required to store each dictionary, according to different encodings and alphabets (the best result on each row is in boldface). Except for QBDDs and BDDs, the one-hot encoding results in fewer nodes, demonstrating the effectiveness of the zero-suppressed idea when encoding large, sparse data. Among the DD types we consider, ESRBDDs have the fewest nodes, regardless of encoding and alphabet. For binary encodings, ESRBDDs use 19%– 39% fewer nodes than TBDDs, the second best choice. With one-hot encodings, ZDDs, CZDDs, TBDDs, and ESRBDDs tie for best because (a) there are no redundant nodes and (b) any low-zero nodes that are eliminated do not cause an overall decrease in the number nodes in the ESRBDDs. Indeed, redundant nodes are rare even with binary encodings, as they arise when two words w<sup>1</sup> and w<sup>2</sup> not only have bit patterns that differ in a position, but they also share all their possible continuations, i.e., w1w is a word if and only if w2w is also a word, for all w . In the English word list, "Hlidhskjalf" and its alternate spelling "Hlithskjalf" is one such rare instance (note that no w can continue either of them to form an additional word).

#### **4.2 Combinational Circuits**

BDDs are commonly used to synthesize and verify digital circuits. We select a set of combinational circuits from the LGSynth'91 benchmarks [11] and, for each circuit, we build a DD encoding all its output logic functions. For each circuit, the variable order is determined using Sifting [9] while building the BDD.

Table 3 reports the number of nodes needed to encode all outputs of each circuit. In contrast to the dictionaries, these benchmarks place importance on the ability to eliminate redundant nodes. Thus, QBDDs and ZDDs have the worst performance. TBDDs and ESRBDDs are always the two best representations, and the difference between them is less than 0.7%.

#### **4.3 Safe Petri Nets**

Decision diagrams are frequently used in symbolic model checking to represent sets of states. We have selected a set of 37 *safe* Petri nets from the 2018 Model Checking Contest https://mcc.lip6.fr/2018/. A Petri net is safe if each one of its places can contain at most one token—each place can, therefore, be mapped



**Table 5.** Number of nodes for a subset of the safe Petri net benchmarks.


directly to a boolean variable. Most of these models have scaling parameters that affect their size and complexity, yielding N = 103 model instances.

Providing detailed results for all the model instances would require excessive space, so to summarize over all model instances, Table 4 shows a score for each DD type i. The score is the geometric mean [4]:

$$score(i) = \sqrt[N]{\prod\_{n=1}^{N} \frac{T\_i(n)}{T\_{min}(n)}}$$

where N is the total number of model instances, Ti(n) is the number of nodes needed to represent the state space of instance n using DD type i, and Tmin(n) is the smallest number of nodes needed to represent the state space of instance n by any of the DD types we consider. ESRBDDs have by far the smallest overall score, barely larger than 1, indicating that they are either the smallest or slightly larger than the smallest for each model instance.

Table 5 shows Ti(n) for model instances n that required more than 250,000 nodes in the QBDD representation. For parameterized models that had multiple model instances satisfying this criterion, we present data for only the largest such model instance. We have also included the results for *DiscoveryGPU*—the only model where ESRBDDs were not the best (they were a close second).

#### **4.4 Memory Considerations: The Size of Nodes**

So far, we have compared DD types based on how many nodes they require. However, the actual memory consumption also depends on the size of the respective nodes. All of these DDs store two child pointers. In addition, BDDs and ZDDs


**Table 6.** Overhead of node sizes (bits per node) as compared to QBDD nodes.

store a level, CBDDs and CZDDs store two levels, TBDDs store three levels, while ESRBDDs store a level and two edge rules. Since all short edges must be labeled by S, it is only necessary to label the long edges, and this requires log<sup>2</sup> n bits per edge if there are n non-S reduction rules. Without L<sup>0</sup> edges, a single bit distinguishes <sup>H</sup><sup>0</sup> from <sup>X</sup>; otherwise, two bits are required for rules {H0, <sup>L</sup>0, <sup>X</sup>}. QBDD nodes are therefore the smallest (typically requiring 64 or 128 bits, when 32–bit or 64–bit pointers are used, respectively) and Table 6 indicates the *additional* cost required for each node type, when the level integers are stored using 16 bits (as suggested by [2]), 20 bits (as suggested by [10]), and 32 bits.

ESRBDDs are clearly more memory efficient than CBDDs, CZDDs and TBDDs. There are a few instances in our experiments where TBDDs use marginally fewer nodes than ESRBDDs (less than 3.2% fewer nodes in every such instance), but not enough to overcome their per-node memory overhead.

#### **5 Conclusions**

We have shown that ESRBDDs are a simple, yet efficient, generalization of previous attempts at combining reduction rules. Unlike previous efforts, they are not biased towards any particular reduction rule and therefore eliminate the need for the user to prioritize the reduction rules. They also provide a framework for further generalizations through additional reduction rules—for example, "highone" and "low-one", the duals of "low-zero" and "high-zero" respectively.

ESRBDDs allow users to select a subset of reduction rules that suit their needs, and make it possible to integrate domain-specific reduction rules (a common phenomenon) with a subset of existing ones. ESRBDD nodes are also more compact than all previous such efforts, and new reduction rules can be added at a small cost—log<sup>2</sup> n bits per edge, where n is the number of reduction rules. Our future efforts will be directed towards adapting BDD manipulation operations (such as *Apply*) to work with the reduction rules in ESRBDDs, and towards including complement edges and other reduction rules, such as "high-one", "lowone", or "identity" reductions, while maintaining canonicity.

**Acknowledgments.** This work was supported in part by National Science Foundation grant ACI-1642397.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Effective Entailment Checking for Separation Logic with Inductive Definitions**

Jens Katelaan1(B), Christoph Matheja<sup>2</sup>, and Florian Zuleger<sup>1</sup>

<sup>1</sup> TU Wien, Vienna, Austria jkatelaan@forsyte.at <sup>2</sup> RWTH Aachen University, Aachen, Germany

**Abstract.** Symbolic-Heap Separation logic is a popular formalism for automated reasoning about heap-manipulating programs, which allows the user to give customized data structure definitions.

In this paper, we give a new decidability proof for the separation logic fragment of Iosif, Rogalewicz and Simacek. We circumvent the reduction to MSO from their proof and provide a direct model-theoretic construction with elementary complexity. We implemented our approach in the Harrsh analyzer and evaluate its effectiveness. In particular, we show that Harrsh can decide the entailment problem for data structure definitions for which no previous decision procedures have been implemented.

### **1 Introduction**

Separation logic (SL) [12,18] is a popular formalism for Hoare-style verification of imperative, heap-manipulating programs. In particular, the *symbolic heap* separation logic fragment has received a lot of attention: Symbolic heaps serve as the basis of various automated verification tools, such as Infer [6], Sleek [7], Songbird [19], GRASSHopper [17], VCDryad [16], VeriFast [13], SLS [20], and Spen [9]. Many of the aforementioned tools rely on *systems of inductive predicate definitions* (SID) that serve as specifications of dynamic data structures, e.g., linked lists and trees.

At the heart of every Hoare-style verification procedure based on separation logic lies the *entailment problem*: Given two SL formulas, say ϕ and ψ, is every model of ϕ also a model of ψ? While the entailment problem is undecidable in general [2], there are various approaches to decide entailments between symbolic heaps ranging from complete methods for fixed SIDs [3], over decision procedures for restricted classes of SIDs [10,11], to incomplete approaches, such as fold/unfold reasoning [7] or cyclic proofs [5].

Among the largest decidable fragments of symbolic heaps with inductive definitions is the fragment of *symbolic heaps with bounded tree-width* (SLbtw) [10]. This fragment supports a rich class of data structures in SID specifications, such as doubly-linked lists and binary trees with linked leaves. SLbtw introduces


**Fig. 1.** An SID Φ with three predicates for binary trees with parent pointers.

three syntactic conditions on SIDs—progress, connectivity, and establishment that enable a reduction from the entailment problem for SLbtw to the (decidable) satisfiability problem for monadic second-order logic (MSO) over graphs of bounded tree width. This gives rise to a decision procedure of non-elementary complexity—at least without an in-depth analysis of the quantifier alternations involved in the reduction. The reduction to MSO is also technically involved and has—to the best of our knowledge—never been implemented. The authors remark in the follow-up paper [11] that "the method from [10] causes a blowup of several exponentials in the size of the input problem and is unlikely to produce an effective decision procedure."

*Contributions.* We give a new proof for the decidability of the entailment problem for the SLbtw fragment. In contrast to [10], we circumvent the reduction to MSO and give a direct model-theoretic construction with elementary complexity. This yields an easy-to-implement decision procedure for entailments in the full SLbtw fragment. We implemented our approach in the Harrsh analyzer and report on promising results for challenging examples (Sect. 6). In particular, we show that Harrsh can decide the entailment problem for data structure definitions for which no previous decision procedures have been implemented.

*A challenging example.* To highlight the challenges faced when developing and implementing decision procedures for entailments in SLbtw, consider the SID Φ consisting of the rules in Fig. 1. <sup>1</sup> There are three predicates, namely tree, rtree, and ltree, that specify binary trees with parent pointers (treep for short). The predicate tree takes two parameters representing the root of the tree and its parent. Predicates rtree and ltree both have the leftmost leaf of the tree as an additional parameter. Such a parameter may, for example, be required to specify tree segments for an automated program analysis. Although both rtree and ltree describe treeps, they take radically different approaches: Predicate rtree defines a treep starting at the root, i.e., it specifies the root of the treep

<sup>1</sup> The syntax and semantics of SIDs are defined formally in Sect. 3.

and then states that both subtrees are treeps (the parameter representing the leftmost leaf is additionally passed to the left subtree). In contrast, predicate ltree specifies treeps starting at the leftmost leaf and moving up to the root. Consequently, the models of these predicates are derived in completely different ways, which is a challenge for commonly applied approaches, such as fold/unfold (cf. [7]) or inductive reasoning (cf. [5,19,20]). In fact, the

**Fig. 2.** treep

entailment ltree(x1, x2, x3) <sup>|</sup><sup>=</sup> rtree(x2, x3, x1) holds, whereas the entailment rtree(x2, x3, x1) <sup>|</sup><sup>=</sup> ltree(x1, x2, x3) is violated: Intuitively, rtree admits models in which all shortest paths from the root to the leftmost leaf have length one. In contrast, for ltree, the minimal length of all shortest paths is two. Thus, the heap illustrated in Fig. 2 is a model of rtree, but not of ltree. In fact, if we rule out this model, rtree and ltree entail each other. That is, the entailment below and its converse are both valid:

$$\mathtt{ilt}\mathsf{tree}(x\_1, x\_2, x\_3) \mid \neg \exists \ell, r \colon x\_2 \mapsto (l, r, x\_3) \ast \mathtt{rt}\mathsf{tree}(l, x\_2, x\_1) \ast \mathtt{t}\mathsf{tree}(r, x\_2) \quad (\spadesuit)$$

Harrsh solved the entailment (♣) from above in less than a second. The only other tool capable of successfully solving (♣) is Slide [11], which is based on tree automata. However, the approach in [11] is not complete for SLbtw.

*Overview of our approach.* We first present an algebra `a la Courcelle [8] to systematically construct models of separation logic formulas (Sect. 2). This algebra enables us to conveniently formalize the semantics of separation logic (Sect. 3). To decide entailments, we then develop an abstraction mechanism for models with the following properties (Sect. 4):


How do we obtain a decision procedure from these properties for an entailment, say pred1(**x1**) |=<sup>Φ</sup> pred2(**x2**)? We iteratively compute all abstractions corresponding to models of pred1(**x1**). Due to compositionality (1), this can be achieved by applying the same operations used to construct models on previously computed abstractions until a fixed point is reached. Finiteness of the abstraction (2) ensures termination. We then exploit that the abstraction is well-defined (3) and effective (4) to decide the entailment: pred1(**x1**) |=<sup>Φ</sup> pred2(**x2**) holds iff all computed abstractions of models of pred1(**x1**) yield that they are also models of pred2(**x2**) (Sect. 5).

*Due to space constraints, all proofs are in the supplementary material* [1].

*Notation.* The set of all (non-empty) finite sequences over a set S is S<sup>∗</sup> (S<sup>+</sup>). Bold letters denote sequences, e.g., **<sup>x</sup>** <sup>=</sup> x1,...,xk. **<sup>x</sup>**[i] refers to the <sup>i</sup>-th element of **<sup>x</sup>**. We often treat sequences as sets, i.e. we write <sup>y</sup> <sup>∈</sup> **<sup>x</sup>** if <sup>y</sup> occurs in **<sup>x</sup>**, **<sup>x</sup>**∪**<sup>z</sup>** for the set of all elements in **<sup>x</sup>** or **<sup>z</sup>**, etc. <sup>f</sup> <sup>=</sup> {x<sup>1</sup> → <sup>y</sup>1,...,x<sup>n</sup> → <sup>y</sup>n} is the function given by <sup>f</sup>(xi) = <sup>y</sup><sup>i</sup> for <sup>i</sup> <sup>∈</sup> [1, n], <sup>n</sup> <sup>≥</sup> 0. Moreover, functions <sup>f</sup> : <sup>X</sup> <sup>→</sup> <sup>Y</sup> are lifted to functions on sequences <sup>f</sup> : <sup>X</sup><sup>∗</sup> <sup>→</sup> <sup>Y</sup> <sup>∗</sup> by pointwise application.

**Fig. 3.** A heap graph modeling a list segment of length at least 5 from x<sup>1</sup> to x2.

### **2 Heap Graphs**

Separation logic is typically interpreted in terms of stack-heap pairs consisting of a stack, i.e., an evaluation of variables, and a heap, i.e., a finite mapping from memory locations to values. In our setting, however, it is more convenient to abstract from locations and consider labeled graphs.

Formally, let **Var** be a set of variables containing a special variable **null** ∈ **Var**. Moreover, let **Preds** be a set of *predicate identifiers*; each predicate pred ∈ **Preds** is equipped with an arity ar(pred) <sup>∈</sup> <sup>N</sup>. pred(**x**) is a *predicate call* if the length of sequence **x** ∈ **Var**<sup>∗</sup> is ar(pred).

**Definition 1 (Heap Graph).** *<sup>A</sup>* heap graph <sup>M</sup> <sup>=</sup> Ptr, FV, calls *is a graph whose nodes are a finite set of variables in* **Var***. The edges of* M *are given by a partial points-to function* Ptr: **Var** \ {**null**} *finite* **Var**<sup>+</sup> *mapping variables to finite tuples of variables. Moreover,* FV ⊆ **Var** *is a finite set of* free variables *and* calls *is a finite set of* predicate calls*. A heap graph is* concrete *if* calls = ∅*. We collect all variables in* Ptr*,* FV*, and* calls *in* vars(M)*. Finally, we write* PtrM*,* FVM*, and* calls<sup>M</sup> *to refer to the individual components of heap graph* M*.*

*Example 1.* Figure 3 depicts a heap graph modeling a singly-linked list of length at least five with head x<sup>1</sup> and tail x<sup>2</sup> (assuming the predicate call sll(d, x2) stands for non-empty lists segments from d to x2; see the left part of Fig. 5). In our graphical notation, every node corresponds to the variable it is labeled with. Gray nodes correspond to the free variables in FV. For each variable, say <sup>x</sup>, the pointers Ptr(x) = y1,...,yk are represented by directed edges—labeled with the position 1, 2,...,k—from the node labeled with x to nodes labeled with y1,...,yk, respectively. We usually omit the edge labels if each node has at most one outgoing edge. Finally, a predicate call is drawn as a box labeled with the predicate call and connected to the nodes representing the variables occurring in the call's parameters. Formally, the heap graph in Fig. 3 is given by M = Ptr, FV, calls with points-to mapping Ptr <sup>=</sup> {x<sup>1</sup> → a, a → b, b → c, c → <sup>d</sup>}, free variables FV <sup>=</sup> {x1, x2} and predicate calls calls <sup>=</sup> {sll(d, x2)}.

$$
\bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc \bigcirc
$$

**Fig. 4.** Illustration of composition of two heap graphs.

Heap graphs are an abstraction of the classical stack-heap model. To reason about separation logic with heap graphs (and their abstractions), we need a few operations for their systematic construction: Let <sup>f</sup> : **Var** <sup>→</sup> **Var** be a partial function and <sup>f</sup>(M) its application to every variable in every component of <sup>M</sup>.

*Isomorphic heap graphs.* We call a variable <sup>x</sup> <sup>∈</sup> **Var** an *auxiliary variable* of heap graph <sup>M</sup> if <sup>x</sup> is not a free variable of <sup>M</sup>. Throughout this article, we do not distinguish between isomorphic heap graphs, i.e., heap graphs that are identical up to renaming of auxiliary variables. Formally, two heap graphs M<sup>1</sup> and M<sup>2</sup> are *isomorphic*, written M<sup>1</sup> ∼= M2, if there exists a bijective function <sup>f</sup> : vars(M1) <sup>→</sup> vars(M2) such that (1) FV<sup>M</sup><sup>1</sup> <sup>=</sup> FV<sup>M</sup><sup>2</sup> , (2) <sup>f</sup>(x) = <sup>x</sup> for all <sup>x</sup> <sup>∈</sup> FV<sup>M</sup><sup>1</sup> , and (3) <sup>f</sup>(M1) = <sup>M</sup>2.

*Renaming heap graphs.* Our first operation enables renaming of free variables. Formally, let M be a heap graph and **x** ∈ FV<sup>∗</sup> <sup>M</sup>, **y** ∈ **Var**<sup>∗</sup> be repetition free sequences of variables of the same length. Then the *renaming* of **x** to **y** in M is given by rename**<sup>x</sup>**,**<sup>y</sup>**(M) = <sup>f</sup>(M), where

$$f \colon \mathbf{Var} \to \mathbf{Var}, \quad z \mapsto \begin{cases} \mathbf{y}[i] & \text{if } \mathbf{x}[i] = z \\ z & \text{otherwise.} \end{cases}$$

*Composition.* Our next operation allows composing heap graphs by "gluing" them together at their common free variables. Formally, let <sup>M</sup>1,M<sup>2</sup> be heap graphs such that (1) vars(M1) ∩ vars(M2) ⊆ FV<sup>M</sup><sup>1</sup> ∩ FV<sup>M</sup><sup>2</sup> and (2) Ptr<sup>M</sup><sup>1</sup> and Ptr<sup>M</sup><sup>1</sup> are domain disjoint, i.e., dom(Ptr<sup>M</sup><sup>1</sup> ) ∩ dom(Ptr<sup>M</sup><sup>2</sup> ) = ∅. Then the componentwise union <sup>M</sup><sup>1</sup> ∪ M<sup>2</sup> of <sup>M</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> is Ptr<sup>M</sup><sup>1</sup> <sup>∪</sup> Ptr<sup>M</sup><sup>2</sup> , FV<sup>M</sup><sup>1</sup> <sup>∪</sup> FV<sup>M</sup><sup>2</sup> , calls<sup>M</sup><sup>1</sup> <sup>∪</sup>calls<sup>M</sup><sup>2</sup> . Otherwise, <sup>M</sup><sup>1</sup> ∪M<sup>2</sup> is undefined. We then define the composition <sup>M</sup><sup>1</sup> • M<sup>2</sup> of heap graphs <sup>M</sup>1,M<sup>2</sup> as

$$
\mathcal{M}\_1 \bullet \mathcal{M}\_2 = \begin{cases}
\mathcal{M}\_1 \cup \mathcal{M} & \text{where } \mathcal{M} \cong \mathcal{M}\_2 \text{ and } \mathcal{M}\_1 \cup \mathcal{M} \text{ is defined} \\
\text{undefined} & \text{otherwise.}
\end{cases}
$$

*Example 2.* Figure 4 depicts the composition of two heap graphs representing lists of length two. Since both heap graphs share a variable a /<sup>∈</sup> FV, we first compute an isomorphic heap graph in which variable a is substituted by c in the second graph. Both heap graphs are then merged at their common free variable <sup>b</sup>. This results in a heap graph modeling a list of length four. *Forgetting free variables.* To construct larger heap graphs from smaller ones, we often need additional free variables to glue the right nodes together, e.g., the variable b in Example 2. Consequently, we need a mechanism for subsequent removal of these variables from the set of free variables. To this end, for every heap graph M and sequence of free variables **x** ∈ FV<sup>∗</sup> <sup>M</sup>, we define the operation forget**x**(M) = PtrM, FV<sup>M</sup> \ **<sup>x</sup>**, callsM.

*Single allocations.* The simplest non-empty heap graph is a single variable, say x with pointers to a sequence **y** of finitely many other variables. We write x **y** to denote this *single-allocation heap graph* {<sup>x</sup> → **<sup>y</sup>**} , {x} ∪ **<sup>y</sup>**, ∅.

**Theorem 1 (**[8]**).** *Every non-empty heap graph of tree width at most* k *can be constructed from heap graphs* x **y***, renaming, composition, and forgetting using at most* k + 1 *free variables.*

### **3 Symbolic Heap Separation Logic**

We consider the symbolic heap fragment of separation logic with user-defined inductive predicate definitions. We omit pure formulas to simplify the presentation. Notice, however, that our implementation supports reasoning about symbolic heaps with pure formulas.

*Syntax.* The syntax of our simplified symbolic heap fragment is then given by the following context-free grammar:

$$
\varphi ::= \mathbf{emp} \mid x \mapsto \mathbf{y} \mid \mathbf{pred}(\mathbf{y}) \mid \exists x \colon \varphi \mid \varphi \* \varphi,
$$

where <sup>x</sup> <sup>∈</sup> **Var** \ {**null**} is a variable, **<sup>y</sup>** <sup>∈</sup> **Var**<sup>+</sup> is a sequence of variables, and pred(**y**) is a predicate call. Here, **emp** is the *empty heap*, <sup>x</sup> → **<sup>y</sup>** asserts that <sup>x</sup> *points-to* the locations captured by **<sup>y</sup>**, <sup>∃</sup>x: <sup>ϕ</sup> is *existential quantification*, and ∗ is the *separating conjunction*. Because ∗ is commutative and associative and because existential quantifiers can always be moved to the front, we will always consider symbolic heaps to be of form <sup>∃</sup>**y**: (x<sup>1</sup> → **<sup>y</sup>**1) ∗ ··· ∗ (x<sup>m</sup> → **y**m) ∗ pred1(**z**1) ∗···∗ predn(**z**n).

*Inductive definitions.* Before we assign formal semantics to symbolic heaps, we clarify how custom predicates are specified. To this end, a *system of inductive definitions* (SID) is a finite set <sup>Φ</sup> of rules of the form pred ⇐ <sup>ϕ</sup>, where pred <sup>∈</sup> **Preds** is a predicate symbol and ϕ is a symbolic heap. We assume that all symbolic heaps of rules with head pred have the same sequence of free variables (x1,...,xar(pred))<sup>2</sup> and collect these variables in the set fv(pred). Moreover, we collect all predicates that occur in SID Φ in the set **Preds**(Φ) and all rules of SID Φ in the set **Rules**(Φ). Examples of SIDs are found in Figs. 1 and 5.

<sup>2</sup> A variable is in the set fv(ϕ) of free variables of ϕ if it is not bound by a quantifier.

*Semantics.* We define the semantics of symbolic heaps ϕ for a given SID Φ in terms of a force relation |=Φ, which determines whether a heap graph M satisfies ϕ. To this end, let ϕ[**x**/**y**] denote the symbolic heap ϕ in which every free occurrence of variable **<sup>x</sup>**[i] is substituted by variable **<sup>y</sup>**[i], where 1 <sup>≤</sup> <sup>i</sup> ≤ |**x**<sup>|</sup> <sup>=</sup> <sup>|</sup>**y**|. Then the relation |=<sup>Φ</sup> is defined inductively on the syntax of symbolic heaps:

M |=<sup>Φ</sup> **emp** iff ex. **<sup>x</sup>** <sup>∈</sup> **Var**<sup>∗</sup> s.t. <sup>M</sup> <sup>=</sup> ∅, **<sup>x</sup>**, ∅ M |=<sup>Φ</sup> <sup>x</sup> → **<sup>y</sup>** iff ex. **<sup>z</sup>** ⊇ {x} ∪ **<sup>y</sup>** s.t. <sup>M</sup> <sup>=</sup> {<sup>x</sup> → **<sup>y</sup>**}, **<sup>z</sup>**, ∅ M |=<sup>Φ</sup> pred(**y**) iff ex. **<sup>z</sup>** <sup>⊇</sup> **<sup>y</sup>** s.t. <sup>M</sup> <sup>∼</sup><sup>=</sup> ∅, **<sup>z</sup>**, {pred(**y**)} or ex. (pred ⇐ <sup>ψ</sup>) <sup>∈</sup> **Rules**(Φ) s.t. M |=<sup>Φ</sup> <sup>ψ</sup>[fv(pred)/**y**] M |=<sup>Φ</sup> <sup>∃</sup>x: <sup>ϕ</sup> iff ex. <sup>y</sup> <sup>∈</sup> **Var** s.t. PtrM, FV<sup>M</sup> ∪ {y}, callsM |=<sup>Φ</sup> <sup>ϕ</sup>[x/y] M |=<sup>Φ</sup> <sup>ϕ</sup><sup>1</sup> <sup>∗</sup> <sup>ϕ</sup><sup>2</sup> iff ex. <sup>M</sup>1,M<sup>2</sup> s.t. <sup>M</sup> <sup>∼</sup><sup>=</sup> <sup>M</sup><sup>1</sup> • M<sup>2</sup> and <sup>M</sup><sup>1</sup> <sup>|</sup>=<sup>Φ</sup> <sup>ϕ</sup><sup>1</sup> and <sup>M</sup><sup>2</sup> <sup>|</sup>=<sup>Φ</sup> <sup>ϕ</sup><sup>2</sup>

The above semantics coincides with the standard least fixed-point semantics of symbolic heaps (cf. [4]) for stack-heap pairs if we restrict ourselves to concrete heap graphs. Moreover, there is a strong relationship between our SL semantics and the operations on heap graphs defined in Sect. 2.

**Lemma 1.** *Let* <sup>ϕ</sup> <sup>=</sup> <sup>∃</sup>**y**: (x<sup>1</sup> → **<sup>y</sup>**1)∗···∗(x<sup>m</sup> → **<sup>y</sup>**m)∗pred1(**z**1)∗···∗predn(**z**n) *be a symbolic heap.* M |=<sup>Φ</sup> <sup>ϕ</sup> *iff there exist* <sup>M</sup>1,...,Mm+<sup>n</sup> *such that (1)* <sup>M</sup><sup>i</sup> <sup>|</sup>=<sup>Φ</sup> <sup>x</sup> → **<sup>y</sup>**<sup>i</sup> *for* <sup>1</sup> <sup>≤</sup> <sup>i</sup> <sup>≤</sup> <sup>m</sup>*, (2)* <sup>M</sup>m+<sup>j</sup> <sup>|</sup>=<sup>Φ</sup> pred<sup>j</sup> (fv(pred<sup>j</sup> )) *for* <sup>1</sup> <sup>≤</sup> <sup>j</sup> <sup>≤</sup> <sup>n</sup>*, and (3)* M = forget**y**(M<sup>1</sup> • ··· • M<sup>m</sup> • renamefv(pred1),**z**<sup>1</sup> (Mm+1) • ··· • renamefv(pred*n*),**z***<sup>n</sup>* (Mm+n))*.*

*Symbolic heaps with bounded tree-width.* Our goal is to develop a decision procedure for symbolic heaps with inductive definitions in the bounded tree-width fragment developed by Iosif et al. [10]. This fragment imposes three conditions on SIDs, which we assume for all SIDs Φ considered in the following:


*Assumptions.* We make two further assumptions for all SIDs throughout this paper: (1) *Predicates are called with pairwise different parameters*. (2) *Unfolding predicates (iteratively substituting predicate calls* pred(**y**) *with the right-hand sides* <sup>ϕ</sup>[fv(pred)/**y**] *of rules* pred ⇐ <sup>ϕ</sup>*) always yields satisfiable symbolic heaps*. SIDs can be transformed automatically to satisfy (1) and (2) before applying our decision procedure (cf. [1,14]). The SIDs in Figs. 1 and 5 satisfy all assumptions.

### **4 Profiles: An Abstraction for Concrete Heap Graphs**

*Entailment problem.* We present our approach for entailments pred1(**x**) |=<sup>Φ</sup> pred2(**y**) between predicate calls pred1(**x**), and pred2(**y**) of an SID Φ. We discuss the treatment of more general entailments at the end of Sect. 5. Formally, the entailment pred1(**x**) |=<sup>Φ</sup> pred2(**y**) holds iff for all concrete heap graphs M, we have M |=<sup>Φ</sup> pred1(**x**) implies M |=<sup>Φ</sup> pred2(**y**).

*Model reconstruction.* Recall from Lemma 1 that M |=<sup>Φ</sup> pred1(**x**) can be interpreted as being able to construct M as a model of pred1(**x**) using the rules of SID Φ and our operations on heap graphs introduced in Sect. 2. To prove the entailment pred1(**x**) |=<sup>Φ</sup> pred2(**y**), we then have to "reconstruct" any such M as a model of pred2(**y**). Since infinitely many model reconstructions might be required—after all there might be infinitely many M with M |=<sup>Φ</sup> pred1(**x**) we now develop an abstraction of heap graphs such that finitely many abstract model reconstructions suffice to cover all models of pred1(**x**).

*Running example.* To sharpen our intuition, we present the technical details of our abstraction together with a running example: Fig. 5 shows an SID Φlists specifying predicates for various singly-linked list segments. The predicate sll(x1, x2) specifies non-empty singly-linked list segments with head x<sup>1</sup> and tail x2. Similarly, the predicates odd(x1, x2) and even(x1, x2) restrict such list segments to odd and even length, respectively. In the remainder of this and the next section, we will use our abstraction to show that the entailment sll(x1, x2) <sup>|</sup>=<sup>Φ</sup>lists odd(x1, x2) does *not* hold.

sll(x1, x2) ⇐ <sup>x</sup><sup>1</sup> <sup>x</sup><sup>2</sup> sll(x1, x2) ⇐ ∃<sup>y</sup> : <sup>x</sup><sup>1</sup> <sup>y</sup> <sup>∗</sup> sll(y, x2) odd(x1, x2) ⇐ <sup>x</sup><sup>1</sup> <sup>x</sup><sup>2</sup> odd(x1, x2) ⇐ ∃<sup>y</sup> : <sup>x</sup><sup>1</sup> <sup>y</sup> <sup>∗</sup> even(y, x2) even(x1, x2) ⇐ ∃<sup>y</sup> : <sup>x</sup><sup>1</sup> <sup>y</sup> <sup>∗</sup> odd(y, x2)

**Fig. 5.** SIDs Φsll (left) and Φo/e (right) specifying singly-linked list segments with head x<sup>1</sup> and tail x2. Moreover, we define Φlists = Φsll ∪ Φo/e.

#### **4.1 Context Profiles as an Abstract Domain**

*Contexts.* Our proposed abstraction is based on *contexts*. Intuitively, every context describes an extension of a concrete heap graph by predicate calls such that the resulting graph satisfies a fixed predicate call. Thus, contexts reveal what is missing in a concrete heap graph to reconstruct models of predicate calls.

**Definition 2 (Context).** *A triple* <sup>C</sup> <sup>=</sup> V, pred(**x**), calls *is a* context *of a concrete heap graph* <sup>M</sup> *w.r.t. SID* <sup>Φ</sup> *if (1)* <sup>V</sup> <sup>=</sup> FVM*, (2)* PtrM, **<sup>x</sup>**, calls |=<sup>Φ</sup> pred(**x**)*, and (3) neither* **x** *nor* calls *contain auxiliary variables of* M*. Moreover, we define the set of* free variables *of context* C *as* fv(C) := V*. We call variables in* **x** *or* calls*, but not in* fv(C)*, the* auxiliary variables *of* C*.* *Example 3.* Figure 6 shows contexts for two concrete heap graphs Modd and Meven of odd and even length (without dashes), respectively. The extension by calls from the contexts is illustrated by dashed lines. Intuitively, context C<sup>1</sup> states that no extension of <sup>M</sup>odd is needed to obtain a model of predicate odd(x1, x2). Context <sup>C</sup><sup>2</sup> states that—in order to obtain an odd list segment from <sup>x</sup><sup>1</sup> to <sup>a</sup>, where a is an additional free variable—we have to add an even list segment from x<sup>2</sup> to a. Similarly, we obtain an even list segment from x<sup>1</sup> to some fresh variable <sup>a</sup> by adding an odd list segment from <sup>x</sup><sup>2</sup> to <sup>a</sup>. The interpretation of contexts <sup>C</sup>4, C5, and C<sup>6</sup> of Meven is analogous.

*Contexts decompositions.* A context of heap graph M stores the free variables of M. These variables are important, because additional free variables might allow to split a heap graph into several smaller ones. For example, the additional free variable b in Fig. 4 (read from right to left) allows to decompose a list into two lists. Since our goal is to develop a compositional abstraction, we have to take contexts of decompositions of heap graphs into account. In general, these decompositions are relevant for entailment when considering more complicated SIDs, e.g., doubly-linked binary trees or trees with linked leaves. We thus have to compute decompositions <sup>M</sup><sup>1</sup> • ... • Mk, <sup>k</sup> <sup>≥</sup> 1, of a concrete heap graph <sup>M</sup> and then consider a context for each component.

**Definition 3 (Context decomposition).** *A* context decomposition *of a concrete heap graph* <sup>M</sup> *w.r.t. SID* <sup>Φ</sup> *is a set* <sup>E</sup> <sup>=</sup> {C1,..., <sup>C</sup>k} *such that* <sup>M</sup> <sup>=</sup> <sup>M</sup><sup>1</sup> • ... • Mk*,* <sup>k</sup> <sup>≥</sup> <sup>1</sup>*, is a decomposition of* <sup>M</sup> *and* <sup>C</sup>1,..., <sup>C</sup><sup>k</sup> *are contexts of the concrete heap graphs* <sup>M</sup>1,...,M<sup>k</sup> *w.r.t.* <sup>Φ</sup>*, respectively. Moreover, we define the set of* free variables *of context decomposition* E *as* fv(E) := C∈E fv(C)*.*

**Fig. 6.** Contexts of concrete heap graphs Modd (first graph) and Meven (fourth graph). The extensions by a context are drawn in dashed lines.

*Example 4.* The concrete heap graph Modd in Fig. 6 cannot be decomposed into smaller graphs due to a lack of free variables. Hence, context decompositions of Modd are singletons consisting of C1, C2, and C<sup>3</sup> in Fig. 6, respectively. *Profiles.* As the above example shows, concrete heap graphs may have multiple context decompositions. We thus abstract a concrete heap graph M by the set of all context decompositions of M:

**Definition 4 (Profiles).** *The* profile profileΦ(M) *of a concrete heap graph* M *w.r.t. SID* <sup>Φ</sup> *is the set of all context decompositions of* <sup>M</sup> *w.r.t.* <sup>Φ</sup>*. Moreover, since all* E∈P *have the same free variables, we define the free variables of* P *as* fv(P) := fv(E) *for some* E∈P*.*

*Refinement property.* We propose profiles as a suitable abstraction for deciding entailments. We will argue that they comply with the four essential correctness properties discussed in Sect. 1: refinement, finiteness, compositionality, and effectiveness. Refinement means that two concrete heap graphs with the same profiles entail the same SID predicates. Hence, for each profile and predicate pred, it suffices to find a single model of pred with that profile. Formally,

**Lemma 2.** *Let* <sup>M</sup>,M *be concrete heap graphs with* profileΦ(M) = profile<sup>Φ</sup> (M )*. Then, for all* pred <sup>∈</sup> **Preds**(Φ)*, we have* M |=<sup>Φ</sup> pred(**x**) *iff* <sup>M</sup> <sup>|</sup>=<sup>Φ</sup> pred(**x**)*.*

*Finiteness.* In general, the set of profiles of concrete heap graphs is infinite due to different names for additional free variables, e.g., variable a in Fig. 6. To obtain a finite set of profiles, we thus (a) limit the total number of free variables, (b) consider profiles up to renaming of additional free variables, and (c) exploit the *connectivity condition*. Notice that condition (a) is not a restriction, because the number of free variables for every SID and thus every entailment query is bounded. For condition (b), we have to lift the notion of isomorphism from heap graphs to profiles. Formally, contexts <sup>C</sup><sup>1</sup> <sup>=</sup> **z**1, pred1(**x**1), calls1 and <sup>C</sup><sup>2</sup> <sup>=</sup> **z**2, pred2(**x**2), calls2 are isomorphic iff **<sup>z</sup>**<sup>1</sup> <sup>=</sup> **<sup>z</sup>2**, pred<sup>1</sup> <sup>=</sup> pred<sup>2</sup> and there exists a bijective function <sup>f</sup> : **Var** <sup>→</sup> **Var** such that (1) for all <sup>z</sup> <sup>∈</sup> **<sup>z</sup>**1, <sup>f</sup>(z) = <sup>z</sup>, (2) <sup>f</sup>(**x**1) = **<sup>x</sup>2**, and (3) calls<sup>2</sup> <sup>=</sup> {pred(f(**y**)) <sup>|</sup> pred(**y**) <sup>∈</sup> calls1}. Moreover, two context decompositions <sup>E</sup>1, <sup>E</sup><sup>2</sup> are isomorphic iff for all <sup>i</sup> ∈ {1, <sup>2</sup>} and contexts C∈E<sup>i</sup> there is a context C ∈ E<sup>3</sup>−<sup>i</sup> that is isomorphic to C. Analogously, two profiles <sup>P</sup>1, <sup>P</sup><sup>2</sup> are isomorphic iff for all <sup>i</sup> ∈ {1, <sup>2</sup>} and context decompositions E∈P<sup>i</sup> there exists a context decomposition E ∈ P<sup>3</sup>−<sup>i</sup> that is isomorphic to Ei.

*Throughout this paper, we do not distinguish between isomorphic contexts, context decompositions, or profiles.*

**Lemma 3.** *For every SID* <sup>Φ</sup> *and variable sequence* **<sup>x</sup>** <sup>∈</sup> **Var**∗*, the set of profiles* **Profiles<sup>x</sup>**(Φ) = {profileΦ(M) | M *concrete heap graph*, fv(profileΦ(M)) <sup>⊆</sup> **<sup>x</sup>**} *is finite up to profile isomorphism.*

*Example 5.* Recall from Fig. 5 the SID Φo/e. Moreover, recall from Fig. 6 the concrete heap graphs Modd and Meven and their contexts C1, C2, C<sup>3</sup> and C4, <sup>C</sup>5, <sup>C</sup>6, respectively. Then the profiles of <sup>M</sup>odd and <sup>M</sup>even w.r.t. <sup>Φ</sup>o/e are (up to isomorphism) profile<sup>Φ</sup>o/e (Modd) = {{C1}, {C2}, {C2}} and profile<sup>Φ</sup>o/e (Meven) = {{C4}, {C5}, {C6}}. In fact, the profile of every singly-linked list segment from <sup>x</sup><sup>1</sup>

to <sup>x</sup><sup>2</sup> of odd (even) length is isomorphic to profileΦo/e (Modd) (profileΦo/e (Meven)). Hence, the profile of every model of the singly-linked list predicate sll(x1, x2) is either profileΦo/e (Modd) or profileΦo/e (Meven).

#### **4.2 Computation of Profiles**

Due to Lemmas 2 and 3, we can decide an entailment pred1(**x**) |=<sup>Φ</sup> pred2(**x**), once the profiles of all models of pred1(**x**) with respect to the rules relevant for pred2(**x**) are known. The key insight underlying our entailment checker is that profiles can be computed automatically in a compositional manner. To this end, recall from Theorem 1 that every concrete heap graph can be constructed from single-allocation heap graphs x **y** by means of renaming, forgetting, and composition. We exploit this by (1) devising an algorithm to compute profileΦ(x **y**) and (2) lifting the operations rename**<sup>x</sup>**,**<sup>y</sup>**, forget**x**, and • for renaming, forgetting, and composition of heap graphs to operations rename**<sup>x</sup>**,**<sup>y</sup>**, forget**x**, and • on profiles.

*Profiles of single allocations.* Since single allocations x **y** cannot be further decomposed, every context decomposition of x **y** w.r.t. an SID Φ is a singleton. Due to the progress condition, every rule of Φ contains exactly one points-to assertion. For each SID rule pred ⇐ ∃**z**: <sup>x</sup> → **<sup>y</sup>** <sup>∗</sup> pred1(**y**1) ∗···∗ predk(**y**k), the corresponding context {x } ∪ **y** , pred(**x**), {pred1(**y**1),..., predk(**y**k)} must be in the profile of x **y** iff x **<sup>y</sup>** is a model of <sup>∃</sup>**z**: <sup>x</sup> → **<sup>y</sup>** . Hence:

**Lemma 4.** *Profiles of single allocations, i.e.,* profileΦ(x **y**)*, are computable.*

*Rename for profiles.* We lift the operation rename**<sup>x</sup>**,**<sup>y</sup>**, which renames each variable in **x** to the corresponding variable in **y** according to their position, from heap graphs to contexts, context decompositions, and profiles by componentwise application. That is, for a context <sup>C</sup> <sup>=</sup> **z**, pred(**u**), calls, a context decomposition E, and a profile P, we define:

> rename**<sup>x</sup>**,**<sup>y</sup>**(C) := rename**<sup>x</sup>**,**<sup>y</sup>**(**z**), pred(rename**<sup>x</sup>**,**<sup>y</sup>**(**u**)), pred (rename**<sup>x</sup>**,**<sup>y</sup>**(**v**) | pred (**v**) ∈ calls rename**<sup>x</sup>**,**<sup>y</sup>**(E) := {rename**<sup>x</sup>**,**<sup>y</sup>**(C) | C ∈ E} rename**<sup>x</sup>**,**<sup>y</sup>**(P) := {rename**<sup>x</sup>**,**<sup>y</sup>**(E) | E ∈ P}

*Forget for profiles.* Next, we lift the operation forget**x**, which removes variables in **x** from the set of free variables, to contexts, context decompositions, and profiles. For a profile, forgetting a free variable means that some of its constituting context decompositions do not have to be considered anymore, because the composition of their underlying models is no longer defined. Hence, these decompositions are removed. Formally, for a context <sup>C</sup> <sup>=</sup> **z**, pred(**u**), calls, a context decomposition E, and a profile P, we define:

$$\begin{aligned} \mathsf{forget}\_{\mathbf{x}}(\mathcal{C}) &:= \langle \mathbf{z} \mid \mathbf{x}, \mathsf{pred}(\mathbf{u}), \mathsf{calls} \rangle & \mathsf{forget}\_{\mathbf{x}}(\mathcal{E}) &:= \{\mathsf{forget}\_{\mathbf{x}}(\mathcal{C}) \mid \mathcal{E} \in \mathcal{E}\} \\ \overline{\mathsf{forget}\_{\mathbf{x}}}(\mathcal{P}) &:= \{\mathsf{forget}\_{\mathbf{x}}(\mathcal{E}) \mid \mathcal{E} \in \mathcal{P} \text{ and } \mathbf{x} \cap \mathsf{usedrs}(\mathcal{E}) = \emptyset\} \\ \mathsf{unsedvs}(\mathcal{E}) &:= \bigcup\_{\mathcal{C} \in \mathcal{E}} \mathsf{usedvs}(\mathcal{C}) & \mathsf{uses}(\mathcal{C}) &:= \mathsf{u} \cup \bigcup\_{\mathsf{pred}'(\mathbf{y}) \in \mathsf{calls}} \mathsf{y} \end{aligned}$$

*Composition for profiles.* It remains to lift heap graph composition to profiles. This is formalized as substituting predicate calls of contexts by other contexts:

**Definition 5 (Context substitution).** *Let* <sup>C</sup><sup>1</sup> <sup>=</sup> **x**1, pred1(**z**1), calls1 *and* <sup>C</sup><sup>2</sup> <sup>=</sup> **x**2, pred2(**z**2), calls2 *be contexts such that (1)* pred1(**z**1) <sup>∈</sup> calls<sup>2</sup> *and (2) no auxiliary variable of* C<sup>2</sup> *is a free variable of* C<sup>1</sup> *and vice versa. Then the* substitution *of* pred1(**z**) *in* C<sup>2</sup> *by* C<sup>1</sup> *is given by*

$$\mathcal{L}\_2[\mathcal{C}\_1] := \langle \mathbf{x\_1} \cup \mathbf{x\_2}, \operatorname{pred}\_2(\mathbf{z\_2}), (\mathtt{call} \mathfrak{s}\_2 \mid \{\mathsf{pred}\_1(\mathbf{z\_1})\}) \cup \mathtt{call} \mathfrak{s}\_1 \rangle. \qquad \triangle$$

To compose profiles, we attempt to substitute the underlying contexts with each other in all possible ways. Formally, a context decomposition E<sup>1</sup> *derives* a context decomposition <sup>E</sup>2, written <sup>E</sup><sup>1</sup> <sup>E</sup>2, iff there exist contexts <sup>C</sup>1, <sup>C</sup><sup>2</sup> ∈ E<sup>1</sup> such that <sup>E</sup><sup>2</sup> = (E<sup>1</sup> \ {C1, <sup>C</sup>2}) ∪ {C<sup>2</sup> [C1]}. <sup>3</sup> We denote by <sup>∗</sup> the reflexive-transitive closure of the derivation relation . The composition of two profiles then consists of all context decompositions derivable from some decompositions of both profiles:

**Definition 6 (Composition of profiles).** *Let* <sup>P</sup><sup>1</sup> *and* <sup>P</sup><sup>2</sup> *be profiles w.r.t.* <sup>Φ</sup>*. Then the* composition P<sup>1</sup> • P<sup>2</sup> *of* P<sup>1</sup> *and* P<sup>2</sup> *is defined as*

$$\mathcal{P}\_1 \mathsf{\top} \mathcal{P}\_2 := \{ \mathcal{E} \mid \exists \mathcal{E}\_1 \in \mathcal{P}\_1, \mathcal{E}\_2 \in \mathcal{P}\_2 \colon \mathcal{E}\_1 \cup \mathcal{E}\_2 \rhd^\* \mathcal{E} \}. \tag{7.1}$$

*Compositionality.* Our lifted heap graph operations satisfy the compositionality property mentioned in Sect. 1. That is,

**Theorem 2.** *For all concrete heap graphs* <sup>M</sup>*,* <sup>M</sup> *and every SID* <sup>Φ</sup>*, we have*

$$\begin{array}{rcl} \overline{\mathtt{ren}\mathtt{am}}\_{\mathtt{x},\mathtt{y}}(\mathtt{profile}\_{\phi}(\mathcal{M})) &=& \mathtt{profile}\_{\phi}(\mathtt{ren}\mathtt{am}\mathtt{e}\_{\mathtt{x},\mathtt{y}}(\mathcal{M})) \\\overline{\mathtt{forget}}\_{\mathtt{x}}(\mathtt{profile}\_{\phi}(\mathcal{M})) &=& \mathtt{profile}\_{\phi}(\mathtt{forget}\_{\mathtt{x}}(\mathcal{M})) \\\mathtt{forfile}\_{\phi}(\mathcal{M}) \ \overline{\bullet} \mathtt{profile}\_{\phi}(\mathcal{M}') &=& \mathtt{profile}\_{\phi}(\mathcal{M} \bullet \mathcal{M}') \end{array}$$

*provided that* rename**<sup>x</sup>**,**<sup>y</sup>**(M)*,* forget**x**(M)*, and* M•M *are defined, respectively.*

*Example 6.* Recall from Fig. 6 the heap graphs Modd and Meven whose profiles w.r.t. Φo/e capture all singly-linked lists. We can construct a concrete heap graph <sup>M</sup> representing a list of length five from <sup>x</sup><sup>1</sup> to <sup>x</sup><sup>2</sup> by computing

$$\mathcal{M} := \mathsf{rename}\_{v, x\_2} \left( \mathsf{forget}\_{x\_2} \left( \mathcal{M}\_{\text{odd}} \bullet \mathsf{rename}\_{(x\_1, x\_2), (x\_2, v)} (\mathcal{M}\_{\text{even}}) \right) \right).$$

<sup>3</sup> Recall that all definitions are to be read up to isomorphism, i.e., auxiliary variables of C1, C2, and E<sup>2</sup> may be renamed prior to the substitution.

Then, by Theorem 2, the corresponding profile profileΦo/e (M) is given by:

$$\overline{\operatorname{ren}\mathfrak{a}\mathfrak{m}}\_{v,x\_2} \Big( \overline{\operatorname{forget}}\_{x\_2} \Big( \operatorname{profile}\_{\Phi\_{\alpha/a}}(\mathcal{M}\_{\operatorname{odd}}) \, \overline{\bullet} \, \overline{\operatorname{ren}\mathfrak{m}}\_{(x\_1, x\_2), (x\_2, v)} \big( \operatorname{profil} \mathbf{e}\_{\Phi\_{\alpha/a}}(\mathcal{M}\_{\operatorname{even}}) \Big) \Big) \Big)$$

This profile, in turn, coincides with the profile of Modd, i.e., we have

$$\text{profile}\_{\Phi\_{o/a}}(\mathcal{M}) = \text{profile}\_{\Phi\_{o/a}}(\mathcal{M}\_{\text{odd}}).$$

In particular, notice that without the forget statement, we would obtain a heap graph M with an additional free variable. The additional free variable would also influence the profile of M , because there exist more decompositions of M into heap graphs M<sup>1</sup> • M2. Consequently, there are also more context decompositions of M and thus M has a larger profile.

### **5 An Effective Decision Procedure for Entailment**

*Profile analysis.* We now exploit our abstract domain to develop a decision procedure for entailments of the form pred1(**a**) |=<sup>Φ</sup> pred2(**b**). Let us first consider the case in which the parameters **a** and **b** coincide with the free variables in the rules of the SID, i.e., **a** = fv(pred1) =: **x<sup>1</sup>** and **b** = fv(pred2) =: **x2**. Our key observation is then that analyzing profiles of the entailment's left-hand side suffices to discharge it: The entailment pred1(**x1**) |=<sup>Φ</sup> pred2(**x2**) holds iff the profile of every model M of pred1(**x1**) contains a context decomposition stating that a model of pred2(**x2**) can be reconstructed from M. Formally,

**Theorem 3.** *The entailment* pred1(**x1**) |=<sup>Φ</sup> pred2(**x2**) *holds iff for all concrete heap graphs* <sup>M</sup> *with* M |<sup>=</sup> pred1(**x1**)*,* {FVM, pred2(**x2**), ∅} ∈ profileΦ(M)*.*

*Example 7.* Recall the profiles profile<sup>Φ</sup>o/e (Meven) and profile<sup>Φ</sup>o/e (Modd) from Example 5 computed for models of sll(x1, x2) w.r.t. SID Φo/e (Fig. 5). We now use these profiles to disprove the entailment sll(x1, x2) <sup>|</sup>=<sup>Φ</sup>lists odd(x1, x2): First, observe that all predicates relevant for constructing models of odd(x1, x2) belong to <sup>Φ</sup>o/e <sup>⊆</sup> <sup>Φ</sup>lists. Second, the profile<sup>Φ</sup>o/e (Meven) does not contain a context decomposition {{x1, x2}, odd(x1, x2), ∅}. Hence, by Theorem 3, the entailment does not hold as we cannot reconstruct <sup>M</sup>even as a model of predicate odd(x1, x2).

*Computing profiles.* By Theorem 3, to decide whether pred1(**x1**) |=<sup>Φ</sup> pred2(**x2**) holds, it suffices to compute the finite (by Lemma 3) set of all profiles of models of pred1(**x1**). This is performed by the procedure abstractSID(Φ) shown in Algorithm 1. To understand how the algorithm works, recall how predicates can be unrolled to compute a model: We select an SID rule and replace all of its predicate calls with previously computed models. By Lemma 1, this amounts to performing heap graph operations. That is, we first rename the free variables of previously computed models to match the parameters of predicate calls. After **Algorithm 1:** The algorithm abstractSID(Φ) computes a function f that maps each predicate pred <sup>∈</sup> **Preds**(Φ) to the set of profiles {profileΦ(M) |M |=<sup>Φ</sup> pred(fv(pred))}.

```
1 fcurr := λpred . ∅;
2 repeat
3 fprev := fcurr;
4 for pred ∈ Preds(Φ) do
5 for (pred ⇐ ∃y: x → z0 ∗ pred1(z1) ∗···∗ predk(zk)) ∈ Rules(Φ) do
6 P0 := profileΦ(x -
                         z0);
7 for F1 ∈ fprev(pred1),..., Fk ∈ fprev(predk) do
8 for i ∈ {1,...,k} do
9 Pi := renamefv(predi),zi (Fi);
10 P := forgety(P0 • P1 • ··· • Pk);
11 fcurr(pred) := fcurr(pred) ∪ {P};
12 until fcurr = fprev;
13 return fcurr
```
that, the resulting models and the single allocation (due to the progress condition) of the rule are composed into a single heap graph. Finally, we apply a forget operation to remove free variables that have been existentially quantified.

Algorithm 1 behaves analogously. However, instead of applying operations on heap graphs, it applies our *abstract* operations on profiles (cf. Theorem 2): We select an SID rule pred ⇐ <sup>ϕ</sup> in line 5. By Lemma 4, we can compute the profile of the single allocation in ϕ. (l. 6). We then select previously computed profiles for the predicate rules and rename their free variables to match the parameters of the predicate calls in ϕ (l. 7–9). Finally, the selected profiles are composed and added to the computed profiles of predicate pred (l. 10, 11). The algorithm then proceeds by computing profiles until a fixed point is reached (l. 12).

*Correctness.* Algorithm 1 is guaranteed to terminate due to the finiteness of our abstract domain (Lemma 3). Moreover, it computes the desired set of profiles:

**Theorem 4.** abstractSID(Φ)(pred) = {profileΦ(M) |M |=<sup>Φ</sup> pred(fv(pred)) *and* FV<sup>M</sup> ⊆ fv(pred)}*.*

To check entailments pred1(**a**) |=<sup>Φ</sup> pred2(**b**), where **a** and **b** do not coincide with the free variables of pred<sup>1</sup> and pred<sup>2</sup> in the rules of Φ, it suffices to apply an additional rename operation. Hence, by combining Theorems 3 and 4, we obtain a constructive decidability proof for entailments between predicate calls. Moreover, a close inspection of the size of the set of profiles and the runtime of Algorithm 1 reveals that our decision procedure runs in time doubly exponential in the size of a given SID. A detailed analysis is found in [1, Sect. 7.4].

**Corollary 1.** *It is decidable in doubly exponential time whether the entailment* pred1(**a**) |=<sup>Φ</sup> pred2(**b**) *holds.*

*Generalizations.* Several of our assumptions about SIDs and entailments have been made purely to simplify the presentation. In fact, Corollary 1 can be generalized to (1) decide entailments <sup>ϕ</sup> <sup>|</sup>=<sup>Φ</sup> <sup>ψ</sup> for symbolic heaps ϕ, ψ (instead of predicate calls) and (2) SIDs with pure formulas. Both extensions are supported by our implementation. Further details are found in [1].

### **6 Experiments**

We implemented our decision procedure for entailment in the separation logic prover Harrsh [1,15], which is written in Scala. Harrsh supports the full SLbtw fragment, including pure formulas, parameter repetitions, and entailments between symbolic heaps (as opposed to single predicate calls). Table 1 summarizes the results of our evaluation for a selection of entailments and SIDs. Our full collection of 101 benchmarks and all experimental results are available online [1].

*Methodology.* We compared Harrsh against Songbird [19], the winner of the SID entailment category of this year's separation logic competition, SL-COMP'18; and against Slide [11], the tool that is most closely related to our approach but that is complete only for a subclass of SLbtw. Experiments were conducted using the popular benchmarking harness jmh on an Intel® Core™ i7-7500U CPU running at 2.70 GHz with a memory limit of 4 GB. We report the average run times obtained by running Jmh on each benchmark for 100 s.

*Benchmarks.* Besides the running example (with sll, even and odd as in Fig. 5) and the entailments for doublylinked trees discussed in the introduction (with ltree, rtree as defined in Fig. 1), we show results on standard data-structure specifications from the SL literature: Sev-

eral variants of trees with linked leaves (tll [10], atll, tlllin) and doubly-linked lists (dllht [18] defining lists from head to tail, dllth from tail to head). Beyond lists and trees, we checked an entailment between *doubly-linked* 2*-grid segments* (see Fig. 7) defined forwards dlgridr and backwards dlgridl. 4

*Size of the abstraction.* Beside the run times, we report the size of the abstraction computed by Harrsh. More specifically, we report (1) the total number of profiles in the fixed point of abstractSID (#P), (2) the total number of context decompositions across all profiles (#D), and (3) the total number of contexts across all decompositions of all profiles (#C). This shows that even though the abstract domain **Profiles<sup>x</sup>**(Φ) is very large in general, Harrsh typically only needs to explore a small portion of it to decide an entailment.

<sup>4</sup> Formal definitions of all SIDs are found in the supplementary material [1].

**Table 1.** The performance of Harrsh (HRS), Songbird (SB) and Slide (SLD) on a variety of SIDs; and the size of the abstraction computed by Harrsh. The timeout (TO) was 180,000 ms. Termination before the timeout but without result is denoted (U). Wrong results/crashes are marked (X).


*Results.* Table 1 reveals that our decision procedure—being the first implemented decision procedure that is complete for the entire SL fragment SLbtw—is not only of theoretical interest, but can also solve challenging entailment problems efficiently in practice. While Slide was faster on some benchmarks that fall into the fragment defined in [11], as well as on some SIDs outside of that fragment, Harrsh was able to solve several benchmarks on which Slide failed. Two benchmarks led to errors: One wrong result and one program crash (the first and the second entries marked by (X) in Table 1, respectively). We are unsure whether the timeouts encountered on the TLL benchmarks are caused by a bug in Slide, as Slide is quite efficient on other TLL variants (see [11, Table 1]). Furthermore, note that Harrsh significantly outperformed Songbird, providing further evidence of the effectiveness of our profile-based abstraction.

### **7 Conclusion**

We presented an alternative proof for decidability of entailment in separation logic with bounded tree width [10]. In contrast to the original proof, we give a direct model theoretic construction. We implemented the resulting decision procedure in the tool Harrsh and obtained promising experimental results. For future work, we plan to extend our approach to the bi-abduction problem.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# Safety and Fault-Tolerant Systems

# **Digital Bifurcation Analysis of TCP Dynamics**

Nikola Beneˇs, Luboˇs Brim, Samuel Pastva(B) , and David Safr´ ˇ anek

Faculty of Informatics, Masaryk University, Brno, Czech Republic {xbenes3,brim,xpastva,safranek}@fi.muni.cz

**Abstract.** Digital bifurcation analysis is a new algorithmic method for exploring how the behaviour of a parameter-dependent computer system varies with a change in its parameters and, in particular, for identification of bifurcation points where such variation becomes dramatic. We have developed the method in an analogy with the traditional bifurcation theory and have it successfully applied to models taken from systems biology. In this case study paper, we demonstrate the appropriateness and usefulness of the digital bifurcation analysis as a push-button alternative to the classical approaches as traditionally used for analysing the stability of TCP/IP protocols. We consider two typical examples (congestion control and buffer sizes throughput influence) and show that the method provides the same results as obtained with classical non-automatic analytical and numerical methods.

### **1 Introduction**

The objective of the bifurcation theory is to study qualitative changes to the properties of a parameter-dependent system as parameters are varied. The method is typically applied to continuous-time or discrete-time dynamical systems. Even a tiny change in parameters may cause a dynamical system to exhibit entirely different qualitative features. Such dramatic changes in the topology of the phase space of a dynamical system are known as *bifurcations*, and the values of the parameters for which a bifurcation occurs are called *bifurcation points*. For a complete global understanding of a complex dynamical system, it is essential to know the bifurcation points, as well as the parameter ranges in which there is no fundamental change. A simple example of a real-life bifurcation is the phase transition of water to ice at the temperature of 0 ◦C. At this critical temperature, a tiny change in the temperature results in a "sudden" systematic change in the substance. The two materials are governed by a different set of parameters and qualitative properties. For example, we can talk about cracking ice but not water.

Non-linear dynamical systems appearing in physics, biology or economy are not the only source of bifurcation phenomena. Even computer systems can

This work has been partially supported by the Czech Science Foundation grant No. 18-00178S.

c The Author(s) 2019

T. Vojnar and L. Zhang (Eds.): TACAS 2019, Part II, LNCS 11428, pp. 339–356, 2019. https://doi.org/10.1007/978-3-030-17465-1\_19

suddenly alter the quality of their behaviour. A simple example might be a significant performance degradation of a computation caused by system swapping. Studying bifurcations in computer systems can provide an additional formal analysis ingredient leading to a better understanding of critical systems properties, like stability or robustness.

Inspired by the bifurcation theory for dynamical systems, we have developed an approach that allows analysing how the dynamics (runs, state transitions) of a discrete computer system changes when its parameters are changed [6,11]. We call the method *digital bifurcation analysis*. In the approach, the qualitative changes in the behaviour are represented as changes in the truth-value of temporal formulae defining specific behaviour (portrait) pattern of the system. The method for computing results of the bifurcation analysis (typically presented as *bifurcation diagrams*) uses our novel symbolic parallel parameter synthesis algorithm [3] which itself builds on the model-checking technology. As the approach employs a hybrid temporal logic for which the algorithm is computationally demanding we have also developed specialised algorithms dedicated to some specific formulae/patterns and thus working more efficiently.

Example of such patterns are attractors, which we see as a particular class of patterns representing the states of the system in which the system's execution persists in the long-time horizon, i.e., the so-called invariant subsets of the state-space towards which the system's runs are attracted. In computer systems, the most typical attractors can be observed in the form of terminal strongly connected components (tSCCs) [38]. We have developed an efficient parallel algorithm for detecting tSCCs in parametrised graphs in [1], and we use this algorithm in our two case studies. We have already successfully applied the digital bifurcation analysis to several models from systems biology [4,5].

In this case-study paper, we report on the application of digital bifurcation analysis to the Transmission Control Protocol (TCP) which currently facilitates most of the internet communication. One of the severe problems in practical applications of TCP is congestion, appearing when the required resources overrun the capacity of internet communication. Over the past years, many internet congestion control mechanisms have been developed to ensure the reliable and efficient exchange of information across the internet, such as Active Queue Management (AQM). Bifurcation analysis of TCP under various congestion control mechanisms have been studied by several authors [16,25,30,32,40,41]. All have used a continuous-time model (e.g., the fluid model) and applied traditional mathematical methods of bifurcation analysis, including simulations, to detect parameter values when the system passes through a critical point, the system loses its stability, and a so-called Hopf bifurcation occurs [22].

Our approach to bifurcation analysis does not require to remodel the given discrete system in terms of a continuous-time dynamical system. Digital bifurcation analysis works directly on discrete models represented as state transition systems. Furthermore, the method is, unlike mathematical methods, fully automatic and does not need mathematical skills to be utilised. Another advantage is that the method is scalable to state spaces with tens of variables and tens of possibly dependent parameters, overcoming thus significantly the limits of traditional mathematical methods. Last but not least the method is advantageous in performing global bifurcation analysis, which is harder to compute than the local analysis where bifurcation points are expected to be approximately known in advance.

It is important to stress that the purpose of this case-study paper is not to propose any new congestion control mechanisms or protocols. We aim to provide a demonstration of the appropriateness and usefulness of the digital bifurcation analysis as a push-button technique that makes a promising alternative to the classical approaches when analysing stability and robustness of TCP protocol specifications and implementations. To that end, we consider two different case studies targeting TCP. In both of them, we analyse how the structure and quality of attractors change when the parameters change. The first one deals with TCP that uses the Random Early Detection (RED) method [14] as an active queue management mechanism to control congestion. Although the RED mechanism alone is easy to understand, its interaction with TCP connections is rather complicated and is not well understood. In [33] the authors used a deterministic non-linear dynamical model of the TCP-RED protocol (together with detailed simulations) to demonstrate that the model exhibits a transition between a stable fixed point and an oscillatory or chaotic behaviour as parameters are varied. In our case study, we were able to achieve the same results fully automatically using our method. In the second example, we consider TCP itself combined with essential performance-oriented extensions. We analyse how the sizes of the send and receive socket buffers influence the throughput; in particular, we identify the combinations of sizes (bifurcation points) for which we observe a dramatic drop. The results we have achieved are in accordance with [28].

It is worth noting that bifurcation analysis provides a conceptually very different view of the protocol functionality than what is usually addressed by formal verification methods. The goal of verification is to prove the correctness of a system specification for all initial states and in the case of parametrised verification also regardless of the number of its components, or the parametrised domain of variables. On the other hand, the goal of bifurcation analysis of parametrised systems is to identify parameter values for which the system suddenly changes its behaviour regardless of its correctness.

Several examples of the TCP protocol verification are in [8,13,18,23,35–37]. As regards parametric verification, the Bounded Retransmission Protocol (BRP) for manually derived constraints has been checked by parametric model-checking in [19], the Stop-and-Wait Protocol (SWP) has been targeted in [15] for all possible values of the maximum sequence number and the maximum number of retransmissions parameters. We are not aware of any formal verification method that would address the bifurcations of the protocol behaviour.

Finally, we discuss the approaches related to bifurcation analysis. To the best of our knowledge, the only related approach to bifurcation analysis that also employs methods of formal verification has been presented in [20,21]. The authors address the identification of bifurcation points in non-trivial dynamics of a numerical cardiac-cell model represented using a hybrid automaton. The method is based on guided-search-based bounded-time reachability analysis used to estimate ranges of parameter values displaying two complementary patterns of systems behaviour. These ranges are computed for bounded-time reachability and over-approximated up to a particular δ-precision due to the underlying δdecision algorithm.

### **2 Attractor Analysis Workflow**

We first describe the standard scenario for digital bifurcation analysis focused on attractor analysis. The input is a parametrised system and a certain classification of stability-based attractor properties that we are interested in. The system is in the model design phase formalised as a discrete finite-state model and subsequently via the state-space generation procedure turned into a parametrised graph. How the initial model is obtained and what language the model is written in is domain-specific and is explained later when describing the case studies. The classification of the attractor properties specifies what shapes and forms of attractors we want to consider distinct enough to express a dramatic change in the system's behaviour. In the simplest case, which we call the *counting version* of our problem, we may be merely interested in the number of attractors and consider two parametrisations of a system non-equivalent if this number changes. More interesting cases may classify the attractors according to various stability-related properties, such as oscillations. The core parametric analysis algorithm then computes the parametric tSCC map. The resulting map is postprocessed, producing e.g. the visualisation of bifurcation diagram, plots, tables, etc. The workflow of our method for the digital bifurcation analysis of attractors is summarised in Fig. 1.

**Fig. 1.** Attractor analysis method workflow.

In general, our digital bifurcation analysis algorithm presupposes that the state space of the model has the form of a parametrised Kripke structure. In this case study, we are interested in attractor properties that are independent of the atomic proposition valuation. We, therefore, consider a simpler formalism here, namely that of parametrised graphs which are directed graphs with selfloops allowed and edges labelled by parameters taken from a given parameter set.

**Definition 1.** *A* graph *is a pair* (V,E) *where* V *is a finite set of* vertices *and* <sup>E</sup> <sup>⊆</sup> <sup>V</sup> <sup>×</sup><sup>V</sup> *is a set of* edges*. A* parametrised graph *is a triple* <sup>G</sup> = (V,E, <sup>P</sup>) *where* <sup>P</sup> *is a set of* parametrisations *and* <sup>E</sup> : <sup>V</sup> <sup>×</sup> <sup>V</sup> <sup>→</sup> <sup>2</sup><sup>P</sup> *such that for each* <sup>p</sup> <sup>∈</sup> <sup>P</sup>*,* <sup>G</sup>*<sup>p</sup>* = (V,E*<sup>p</sup>* <sup>=</sup> {(u, v) <sup>|</sup> <sup>p</sup> <sup>∈</sup> <sup>E</sup>(u, v)}) *is a graph. We call* <sup>G</sup>*<sup>p</sup> the* projection of <sup>G</sup> on p*.*

To be able to investigate the properties of the attractors in the system, we need to use a notion that is analogous to an attractor in a parametrised graph. In dynamical systems theory, an attractor [27] is the smallest set of states (points in the phase space) invariant under the system dynamics. Parametrised graphs can be regarded as discrete abstractions of a dynamical system in which the dynamics are represented using paths in the graph. The respective abstraction of the notion of an attractor thus coincides with the notion of a terminal strongly connected component (tSCC) of a graph.

**Definition 2.** *Let* <sup>G</sup> = (V,E) *be a graph. We say that a vertex* <sup>t</sup> <sup>∈</sup> <sup>V</sup> is reachable *from a vertex* <sup>s</sup> <sup>∈</sup> <sup>V</sup> *if* (s, t) <sup>∈</sup> <sup>E</sup><sup>∗</sup> *where* <sup>E</sup><sup>∗</sup> *denotes the reflexive and transitive closure of* <sup>E</sup>*. A set of vertices* <sup>C</sup> <sup>⊆</sup> <sup>V</sup> *is* strongly connected*, if* <sup>v</sup> *is reachable from* <sup>u</sup> *for any two vertices* <sup>u</sup>*,* <sup>v</sup> <sup>∈</sup> <sup>C</sup>*. A* strongly connected component *(SCC) is a* maximal *strongly connected set* <sup>C</sup> <sup>⊆</sup> <sup>V</sup> *, i.e. such that no* <sup>C</sup> *with* C - <sup>C</sup> <sup>⊆</sup> <sup>V</sup> *is strongly connected. A strongly connected component* <sup>C</sup> *is called* terminal *(tSCC) if* (<sup>C</sup> <sup>×</sup> (<sup>V</sup> \ <sup>C</sup>)) <sup>∩</sup> <sup>E</sup> <sup>=</sup> <sup>∅</sup>*, i.e. there are no edges leaving* <sup>C</sup>*.*

We are now ready to state the algorithmic problem whose solution forms the basis of our method.

**Terminal SCCs Enumeration Problem.** Let G = (V,E, P) be a parametrised graph. The goal is to enumerate, for every parametrisation <sup>p</sup> <sup>∈</sup> <sup>P</sup>, all tSCCs in the graph G*p*, the projection of G on p.

In this general version of our problem, the output is going to be a mapping that assigns to each <sup>p</sup> <sup>∈</sup> <sup>P</sup> the set of all tSCCs of <sup>G</sup>*p*. We call this the *parametric tSCC map*. This map may be then further processed and visualised. We are mainly interested in the *bifurcation diagram* of the model. This diagram is a plot which partitions the parameter space into regions where the behaviour of the system is qualitatively invariant. In the case of a single parameter, this type of one-dimensional diagram is typically augmented by a second dimension which presents the location of the tSCCs with respect to a chosen system variable.

To be able to distinguish between quantitatively different behaviour of the system, we need to formalise the classification of stability-based attractor properties in terms of tSCCs. We thus get a classification function that separates tSCCs into classes. Two parametrisations of a system are then said to be qualitatively different if their respective graphs differ in the count of tSCCs belonging to each class. In the case of the counting version, we thus consider one class of tSCCs only. Here, parametrisations of a system are considered to be qualitatively different if their graphs contain a different number of tSCCs. In the more detailed cases, we can classify tSCCs according to size (small vs large), density (sparse vs dense), graph-specific properties (bipartite vs non-bipartite) etc.

For an example of how these classifications relate to the classical bifurcation analysis, we may see bipartite tSCCs as representing oscillatory patterns in attractors. The change from a small non-bipartite tSCC to a bipartite tSCC can be thus seen as an analogy of the Hopf bifurcation. In our two case studies, we distinguish between sinks (single-state tSCCs), bipartite (oscillatory) tSCC, and other tSCCs, which are further differentiated between small and large, based on a chosen domain-specific threshold.

The rest of this section gives a brief overview of the parallel algorithm for solving the tSCCs enumeration problem that we have developed in [1].

#### **2.1 Core Algorithm**

First, note that a simple *sequential* solution to the problem is to use any reasonable SCC decomposition algorithm (e.g. Tarjan's [39]) and enumerate the tSCCs in the residual graph. However, all known optimal sequential SCC decomposition algorithms use the depth-first search algorithm, which is suspected to be non-parallelisable [34]. There are known parallel SCC decomposition algorithms; for a survey, we refer to [2]. Our approach is based on the observation that we do not have to compute all of the SCCs to enumerate the terminal ones.

Furthermore, instead of scanning through all parametrisations and solving the problem for every one of them separately our approach deals with sets of parametrisations directly. This makes our algorithm suitable for use in connection with various kinds of symbolic set representations. The reason for using a parallel algorithm is the necessity to deal with the high computational demands of the method as discussed in [1].

The main idea of the Terminal Component Detection (TCD) algorithm lies in repeated reachability, which is known to be easily parallelisable. To explain the method, we start with a non-parametrised version of the algorithm. The following explication is illustrated in Fig. 2. Let us assume a given (non-parametrised) graph <sup>G</sup> = (V,E). We choose an arbitrary vertex <sup>v</sup> <sup>∈</sup> <sup>V</sup> (denoted by the double circle in the illustration) and compute all vertices reachable from v; let us call the resulting set of vertices F. We further compute the set of all vertices backwardsreachable from v inside F; we call the resulting set B. Finally, we compute all vertices backwards-reachable from any vertex of F; let us call this set B .

Clearly, B is an SCC of the graph, and moreover, it is a terminal SCC iff <sup>F</sup> \<sup>B</sup> is empty. Furthermore, <sup>B</sup> \<sup>F</sup> contains no tSCCs: all vertices in <sup>B</sup> \<sup>F</sup> have a path to a vertex in <sup>F</sup>. We recursively run the algorithm in <sup>F</sup> \ <sup>B</sup> and <sup>V</sup> \ <sup>B</sup> if non-empty. Observe that no tSCC may intersect both of these sets and these two subproblems can be thus dealt with independently (i.e. in parallel). Note that

**Fig. 2.** Illustration of the non-parametrised version of our algorithm.

every time the algorithm is (recursively) started, its input is an induced subgraph of the original graph that satisfies the precondition that all its tSCCs are tSCCs of the original graph. These observations together imply the correctness of the algorithm.

The asymptotic complexity of the algorithm in its non-parametric version is of the order <sup>O</sup>(|<sup>V</sup> |·(|<sup>V</sup> <sup>|</sup>+|E|)) as in the worst case, every iteration may eliminate a single vertex of the graph. The actual performance of the algorithm strongly depends on the choice of the initial vertex v. If we consistently choose v that lies close to (or directly in) a tSCC of the graph, the complexity gets linear. Of course, such choice cannot be made in advance. The paper [1] discusses the impact of several heuristics that try to approximate this choice.

The algorithm can also be made more efficient using a *trimming* subprocedure in the manner of [26], i.e. removing all vertices without incoming edges. In Fig. 2, the removed vertices are marked in grey; furthermore, the <sup>V</sup> \B part of the graph contains one vertex that would be removed in the next recursive run.

To extend the basic idea to parametrised graphs, we use a notion of parametrised sets of vertices. Formally, a parametrised set of vertices <sup>A</sup> is a function <sup>A</sup>- : <sup>V</sup> <sup>→</sup> <sup>2</sup><sup>P</sup>. To deal with parametrised sets, we use a generalisation of the standard set operations. All the operations are performed element-wise, e.g. the union of parametrised sets <sup>A</sup>- ∪ B is defined as the parametrised set <sup>C</sup>- such that <sup>C</sup>-(v) = A-(v) ∪ B-(v) for all v. The parametrised set of all vertices and all parametrisations is given by <sup>V</sup> such that <sup>V</sup>-(v) = <sup>P</sup> for all <sup>v</sup> <sup>∈</sup> <sup>V</sup> .

The notions of the forward and backward reachable sets can be easily extended to the parametrised setting. They can be computed by a fixed-point algorithm which iterates the parametrised successor (or predecessor) operator. Given a parametrised set of vertices <sup>X</sup>-, the successor operator computes the parametrised set <sup>Y</sup> such that <sup>Y</sup>-(v) = X-(v) ∪ *u*∈*V* (X-(u) <sup>∩</sup> <sup>E</sup>(u, v)) and similarly for the predecessor operator.

The parametrised algorithm then proceeds as described in the previous, extended with the parametrised sets. One further key difference is that instead of choosing one starting vertex, we need to choose a set of starting vertices with disjoint parametrisation sets that together cover all parametrisations that are present in the currently explored parametrised subgraph. The reason for this, as well as a discussion on heuristics that allow choosing such sets efficiently, can be again found in [1].

In the worst case, when parametrisations are represented explicitly, the asymptotic complexity of the algorithm is of the order <sup>O</sup>(|P|·|<sup>V</sup> | · (|<sup>V</sup> <sup>|</sup> <sup>+</sup> <sup>|</sup>E|)). The actual performance of the algorithm depends on various choices and heuristics. It can also be strongly influenced by the usage of a symbolic encoding of the parametrised sets. In this paper, the sets of parametrisations are represented symbolically using an interval encoding, similar to the one used in [10]. Other options for a symbolic representation of parameters include SMT formulae [3].

### **3 Case Studies**

In this section, we present two case studies focusing on discovering bifurcations in the behaviour of the TCP protocol. Each of them addresses a different essential aspect of the protocol, namely congestion control and packet flow stability. We demonstrate how the digital bifurcation analysis can aid in the design, analysis and control of these discrete reactive systems.

In the first case study, we consider a relatively common setting in the standard bifurcation theory: A discrete map governing the behaviour of the RED congestion control mechanism. This mechanism prevents congestion on network nodes such as routers and is subject to changes in its behaviour due to different internal and external parameters. We show how different parameters influence the stability of the mechanism and how a hypothetical system administrator or an automated controller can use this information to avoid faulty behaviour.

The second case study presents an entirely discrete model of the basic TCP focusing on the stability of packet flow. We study the influence of the sender and receiver buffer sizes on the behaviour of the protocol and its ability to transfer packets in a timely manner. We assume the role of a hypothetical protocol designer and consider a set of extensions and modifications to the protocol proposed by various networking experts. We observe that such extensions and their interplay can introduce bifurcations leading to serious degradation of the protocol performance.

The case studies are implemented with the help of the tool Pithya [7] which provides the necessary parametrised graph analysis algorithms. The source code of this implementation is available at https://github.com/sybila/tcp-bifurcation. All experiments were performed on a typical 4-core 3 GHz desktop computer with 16 GB of RAM.

#### **3.1 Instabilities in TCP-RED**

This case study addresses the congestion control in TCP. The congestion control mechanism prevents the protocol from overloading the network with too many packets. The problem has two important aspects. The first aspect is the congestion control on the sender side that has to ensure maximal throughput for a single flow of packets. The second aspect is the congestion control on other network nodes, such as routers, where several connections meet.

One of the common approaches to implementing the congestion control on routers is the Random Early Drop (RED) method proposed in [14]. This technique explicitly drops packets as the router queue starts to fill up. Consequently, senders are indirectly notified (by observing the packet loss) that the link is approaching a congested state before the situation becomes critical.

**Model Description.** To study the RED mechanism, we use a discrete time model proposed in [33]. In Fig. 3, we present the model equations and a basic description of all model variables and constants. Detailed aspects of the model design are given in the original paper.

$$p\_t(\overline{q}\_t) = \begin{cases} 0 & \overline{q}\_t \in [0, q] \\ \frac{\overline{q}\_t - q\_t}{q\_u - q\_t} p\_{max} & \overline{q}\_t \in (q\_t, q\_u) \\ 1 & \overline{q}\_t \in [q\_t, B] \end{cases} \quad (1) \quad q\_t(p\_t) = \begin{cases} B & p\_t \in [0, p] \\ \frac{n \cdot k}{\sqrt{p\_t}} - \frac{c \cdot d}{m} & p\_t \in (p\_t, p\_u) \\ 0 & p\_t \in [p\_u, 1] \end{cases} \quad (2)$$

$$\overline{q}\_{t+1}(\overline{q}\_t) = (1 - w) \cdot \overline{q}\_t + w \cdot q\_t(p\_t(\overline{q}\_t)) \quad (3)$$

$$\text{maximum buffer size } B = 3750 \qquad \text{drop rate } p\_t \in [0, 1]$$

$$\text{lower queue threshold } q\_t = 250 \qquad \text{average size } \overline{q}\_t \in [0, B]$$

$$\text{upper queue threshold } q\_u = 750 \qquad \text{average queue } \overline{q}\_t \in [0, B]$$

$$\text{maximum drop rate } p\_{max} = 4 \text{kb} \qquad \text{lower drop threshold } p\_l = \left(\frac{n \cdot m \cdot k}{dc + Bm}\right)^2$$

$$\text{number of TCP connections } n = 250 \qquad \text{upper drop threshold } p\_u = \left(\frac{n \cdot m \cdot k}{dc}\right)^2$$

$$\text{population delay } = 0.1s \qquad \text{rate constant } k = \sqrt{3/2}$$

averaging weight *w* = 0*.*15

**Fig. 3.** A discrete time model of the RED congestion control behaviour. The individual constants are stated with basic explanations and default values. The parameters are selected from the given set of constants and their bounds are specified later with the corresponding experiments.

link capacity *c* = 75Mb/s

The model assumes n connections flowing through a single RED-capable router. All connections share basic properties, namely the packet size and the propagation delay. In such a case, the situation can be simplified by considering only a single combined flow, as the router cannot differentiate between the individual flows anyway. The router then maintains the current drop rate p*<sup>t</sup>* (Eq. 1) and the queue size q*<sup>t</sup>* (Eq. 2) based on the current exponentially weighted average queue size q*<sup>t</sup>* (Eq. 3).

A typical scenario is that a network administrator takes control over parameters such as the averaging weight w or the queue thresholds q*<sup>l</sup>* and q*u*. Furthermore, it is also important to consider the influence of the connection count n and the propagation delay d, as these numbers will change depending on the current network load.

**Parametrised Graph.** To analyse the model, we require a finite parametrised graph G = (V,E, P). Here, P is the parameter space given by the chosen model parameters (we specify the chosen parameters for each experiment later). In Eq. 3, we write <sup>q</sup>*t*+1(q, λ) for <sup>λ</sup> <sup>∈</sup> <sup>P</sup> to specify the parametrised version of the model.

We assume s + 1 thresholds t<sup>0</sup> < t<sup>1</sup> < ... < t*<sup>s</sup>* such that t<sup>0</sup> = 0 and t*<sup>s</sup>* = B. These thresholds partition the state space of the variable q into s intervals [t0, t1],..., [t*<sup>s</sup>*−1, t*s*], denoted as <sup>I</sup>1,...,I*s*. These intervals then represent vertices of our parametrised graph <sup>V</sup> <sup>=</sup> {I*<sup>i</sup>* <sup>|</sup> <sup>i</sup> <sup>∈</sup> [1, s]}.

Next, we construct the parametrised edges between our intervals so that they over-approximate the behaviour of the original discrete map. Let us consider two intervals I*<sup>i</sup>* and I*<sup>j</sup>* and the edge from I*<sup>i</sup>* to I*<sup>j</sup>* . Clearly, the set of parametrisations <sup>E</sup>(I*i*, I*<sup>j</sup>* ) has to include all parametrisations <sup>λ</sup> such that for some <sup>q</sup>*<sup>t</sup>* <sup>∈</sup> <sup>I</sup>*<sup>i</sup>* it holds that <sup>q</sup>*t*+1(q*t*, λ) <sup>∈</sup> <sup>I</sup>*<sup>j</sup>* . We compute these sets using interval arithmetic, ensuring that all such parametrisations are included.

Finally, since our graph over-approximates the original discrete map, each tSCC over-approximates some attractor(s) of the original system. Furthermore, the precision of this over-approximation can be refined by introducing additional thresholds or substituting interval arithmetic for a more sophisticated approximation method, e.g., Taylor models [24].

**Analysis Results.** The analysis procedure consists of two scenarios:

*Scenario 1:* Consider a system designer who studies the effects of parameters to assess correct settings ensuring the stable behaviour of the protocol. In Fig. 4(a) and (b), the locations and types of attractors are shown for parameters w and n, respectively. It can be seen that increasing the parameter w has a destabilising effect – the small (stable) tSCC (component size <sup>≤</sup>0.<sup>01</sup> · <sup>B</sup>) turns into a bipartite tSCC (representing oscillation) and finally into a large non-bipartite tSCC. On the other hand, the effect of the connection count n is complementary: a higher number of connections stabilise the behaviour (Fig. 4b). Additionally, the protocol behaves as expected in the stable region – w does not influence the location of the steady state whereas a higher number of connections require higher queue sizes to accommodate the increased data flow. Using this

**Fig. 4.** Bifurcation diagrams showing the location and character of the tSCC depending on model parameters in the RED model. The green region indicates a small component (≤0.01 · B), the blue region shows oscillatory behaviour (bipartite graph), and the red region corresponds to a large non-bipartite tSCC. (a) w ∈ [0.1, 0.2] and n = 250; (b) n ∈ [200, 300] and w = 0.15; (c) w ∈ [0.1, 0.2] and n ∈ [200, 300]. (Color figure online)

kind of analysis, a general overview of the systems behaviour w.r.t. the given parameters can be directly obtained in a matter of minutes.

*Scenario 2:* Assume an administrator (or an automated controller) is supposed to adjust the parameter w to preserve the correct functionality of the system subject to a varying number of connections n. In Fig. 4c, it is shown how the character of the attractor changes with the controllable parameter w and the external condition n. This allows the administrator to select optimal values for the given situation. Note that while this specific type of diagram does not show the concrete location of components, it is still contained in the method results and can be used to support the decision further. While this type of analysis is certainly more computationally challenging, it can still be performed in under one hour.

#### **3.2 Packet Flow Stability**

The TCP specification as defined in RFC 793 [31] provides a fundamental description of the TCP protocol such as the packet format or the state machine for event processing. However, many implementation and performance aspects were not addressed in the original specification. Therefore in the subsequent years, several extensions and improvements of the protocol functionality have been introduced [9,12,29].

Nowadays, many well-tested, production ready implementations of TCP exist. However, as demonstrated in [28], non-standard network configurations and combinations of various modifications can cause problems even in wellestablished implementations. Furthermore, new implementations are still being developed where such fundamental problems can easily re-appear [17].

In this case study, we assume the role of a hypothetical protocol engineer. We introduce a basic parametrised model of TCP according to RFC 793 [31] extended with two performance-oriented modifications, namely *delayed acknowledgement* and *Nagle's algorithm*. We observe that these modifications, while useful in many instances, can introduce unexpected bifurcations in the behaviour of the protocol. Additionally, we compare our results with [28].

**Model Description.** We consider a model of TCP based on RFC 793 [31] extended with Nagle's algorithm according to RFC 896 [29] and delayed acknowledgement according to RFC 813 [12] and RFC 1122 [9]. We assume a single sender which sends an uni-directional infinite stream of data to a single receiver connected by a reliable link with unlimited capacity. As parameters, we assume a fixed maximal buffer size S for the sender and R for the receiver. Finally, the size of each packet is limited by the Maximum Segment Size (MSS) set by the network administrator.

Since we are not interested in the exact values of the transmitted data bytes, we can model the state of the protocol using the number of bytes in each protocol phase. This abstraction leads to the following five state variables:


Furthermore, we use outstanding to denote the number of unacknowledged bytes (U plus the sum of all elements in D and A). Since the protocol is not limited by the link capacity, we assume the available window is always equal to min(S, R) minus outstanding bytes. Notice that all the bytes considered by the model variables must be stored in the send buffer (the sender must keep the data until acknowledgement arrives), whereas only the bytes waiting to be acknowledged are stored in the receive buffer.

The dynamics of the model is governed by a set of discrete asynchronous events. Each event can be only executed when its preconditions are met. As our parametrised graph, we consider the graph of the protocol states reachable from the initial configuration where all channels are empty, and all variables are zero. The model consists of the following discrete events:

*Copy data from the application:* Before sending, the data needs to be copied from the application to the kernel memory where the networking layer operates. This occurs in 1024-byte chunks such that at least for every four chunks, the copying is interrupted to send available data right away [28] if possible:

$$\begin{aligned} \mathcal{W} &= \mathcal{W} + k \cdot 1024; \text{ where } k \in [1..4] \text{ is maximal} \\ \text{such that } (k \cdot 1024 + \mathcal{W} + \text{outstanding} \le S) \end{aligned}$$

*Send full packet:* When MSS unsent bytes are available in the send buffer and the window capacity is sufficient, a full packet can be constructed and sent:

<sup>W</sup> <sup>=</sup> <sup>W</sup> <sup>−</sup> MSS; <sup>D</sup> <sup>=</sup> append(D, MSS); when (window <sup>≥</sup> MSS <sup>∧</sup> <sup>W</sup> <sup>≥</sup> MSS)

*Send partial packet:* When less than MSS unsent bytes are available, or the window is not large enough, the protocol can decide to send a partial packet. This decision is governed by Nagle's algorithm which dictates that a partial packet can be sent only when there are no outstanding bytes. This criterion prevents the sender from sending unnecessary small packets in an unbuffered stream of data:

$$\mathbb{W} = \mathbb{W} - packet; \ \mathbb{D} = append(\mathbb{D}, packet); \ \text{where}$$

$$(packet = min(\mathbf{window}, \text{MSS}, \mathbb{W}) \wedge \mathbf{outs} \mathbf{tanding} = 0)$$

*Receive and acknowledge packet:* The receiver can process and acknowledge any data packet (we assume the data is immediately handed over to the application). However, to avoid a large number of small acknowledgement packets, the packet acknowledgement is often delayed until a sufficient amount of data is received (RFC 813). In our case, we use the threshold specified in [28] – 35% of R. In RFC 1122, this rule is further augmented to send an acknowledgement packet whenever two full segments are received:

$$\begin{aligned} \mathsf{A} &= append(\mathsf{A}, \mathsf{U} + head(\mathsf{D})); \ \mathsf{D} = tail(\mathsf{D}); \ \mathsf{U} = 0; \ \text{when} \\ &\quad (|\mathsf{D}| > 0 \land \mathsf{U} + head(\mathsf{D}) \ge min(0.35 \cdot R, 2 \cdot \text{MSS})) \end{aligned}$$

*Receive without acknowledgement:* When the rules of delayed acknowledgement are not met, the data bytes are transferred to the receive buffer instead:

$$\mathbf{U} = \mathbf{U} + head(\mathbf{D}); \ \mathbf{D} = tail(\mathbf{D}); \ \text{when}$$

$$\left( |\mathbf{D}| > 0 \land \mathbf{U} + head(\mathbf{D}) < min(0.35 \cdot R, 2 \cdot \text{MSS}) \right)$$

*Out-of-order acknowledgement:* According to RFC 813, when data is received without immediate acknowledgement, a 200 ms timer should be started to acknowledge the data if no acknowledgement packet is generated in the meantime. However, as discussed in [28], regularly rescheduling such a timer can be an expensive operation. Therefore a cyclic timer acknowledging all received data every 200 ms is often used instead. In our model, we include this design decision by allowing one non-deterministic out-of-order acknowledgement packet to occur:

$$\mathbf{A} = append(\mathbf{A}, \mathbf{U}); \ \mathbf{U} = 0; \ \mathbf{A}\mathbf{C}\mathbf{K} = 1 \text{ when } (\mathbf{U} > 0 \land \mathbf{A}\mathbf{C}\mathbf{K} = 0)$$

*Process acknowledgement:* The data cannot be removed from the send buffer until they are acknowledged. Thus whenever there is an acknowledgement packet in transit, the packet can be processed by the receiver:

$$\mathbf{A} = \operatorname{tail}(\mathbf{A}); \text{ when } |\mathbf{A}| > 0.$$

**Fig. 5.** The bifurcation diagrams showing the character of tSCCs depending on the model parameters in the TCP model. The white space indicates a single large tSCC; the other colours indicate the regions displaying various types of single state tSCCs. (a) MSS = 9204, 1 KiB increments of S and R; (b) MSS = 9204, 8 KiB increments of S and R; (c) MSS = 1460, 1 KiB increments of S and R. (Color figure online)

**Analysis Results.** In our analysis, we assume the buffer sizes S and R ranging from 1 KiB to 64 KiB in 1 KiB increments. First, we consider MSS to be 9204, as in [28]. This MSS configuration corresponds to a specific high-performance network and is not used in typical Ethernet configurations.

The complete results of our analysis are presented in Fig. 5a. In contrary to the previous case study, we consider the presence of a single large terminal tSCC as the desired behaviour (depicted in white). In this case, the situation indicates that the protocol is functioning properly. On the other hand, the presence of a small, single state tSCC means that the protocol cannot continue transmitting and is waiting for a time-out to resolve the problematic situation.

Additionally, based on enabling and disabling various extensions of the protocol model, we can distinguish between different bifurcation causes:


In the case of S<R, the achieved results are in line with the findings of [28]. However, in the R>S area, we observe a bifurcation caused by the interplay of delayed acknowledgement and Nagle's algorithm which has not been considered in the original paper. This bifurcation is caused by small packets sent right after an acknowledgement is received. The small packet is transmitted after the acknowledgement clears the outstanding bytes (so Nagle's condition holds), but before more data is copied into the send buffer (before the acknowledgement was received, the send buffer was full).

In [28], the situation might have been avoided by some undisclosed implementation or timing aspects. However, another possible explanation is that this behaviour has been overlooked because such issues never occurred during the experiments. In Fig. 5b, we present our reconstruction of the same results, but in 8 KiB increments. It corresponds exactly to the experimental evaluation presented in [28]. The described behaviour is absent in this case, since the 8 KiB increments avoid the problematic region entirely.

Finally, in Fig. 5c, we present the same analysis for the maximal buffer size of 32 KiB and MSS of 1460 bytes, which is the typical setting on an Ethernet network. In this case, the red region is completely absent, and while other bifurcations are still present, the problematic regions are much smaller due to the smaller MSS. This puts into perspective the drastic behavioural changes present for larger MSS values and shows how bifurcations can emerge in unexpected situations.

#### **4 Discussion and Conclusion**

In this paper, we have presented two case studies demonstrating a promising application of the digital bifurcation analysis in the domain of network protocols. To that end, we have utilised the methodology developed in our previous work.

The key aspects of the method as applied in this paper are the following. First, it gives rigorous results concerning the given models of the studied protocol. Second, it can be performed fully automatically. In general, the only tasks that have to be done manually are to acquire a suitable model and to post-process the results (incl. visualisation and interpretation). The crucial step to be done within the latter task is to classify the studied protocol properties in terms of attractors. However, this can be easily automated since the interest of a network administrator (or a designer) is primarily focused on parameter values for which the stable behaviour (a single simple attractor) disappears.

Both case studies show that the digital bifurcation analysis provides a methodologically different view on the protocol analysis than formal verification or testing. This is allowed by providing a global view of the protocol behaviour with respect to parameters. Due to the global approach, in the second case study, we have revealed regions in bifurcation diagrams that were omitted in previous studies.

The push-button characteristics of the digital bifurcation analysis allow making the results easily reproducible. All steps necessary to reconstruct both case studies are publicly available<sup>1</sup>.

For future work, our primary intention is to target similar, but not yet fully explored, problems in network protocols using digital bifurcation analysis that will allow further fine-tuning (and generalisation) of the presented workflow.

### **References**


<sup>1</sup> https://github.com/sybila/tcp-bifurcation.


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Verifying Safety of Synchronous Fault-Tolerant Algorithms by Bounded Model Checking**

Ilina Stoilkovska1(B) , Igor Konnov<sup>2</sup>, Josef Widder<sup>1</sup>, and Florian Zuleger<sup>1</sup>

> <sup>1</sup> TU Wien, Vienna, Austria {stoilkov,widder,zuleger}@forsyte.at <sup>2</sup> University of Lorraine, CNRS, Inria, LORIA, Nancy, France igor.konnov@inria.fr

**Abstract.** Many fault-tolerant distributed algorithms are designed for synchronous or round-based semantics. In this paper, we introduce the synchronous variant of threshold automata, and study their applicability and limitations for the verification of synchronous distributed algorithms. We show that in general, the reachability problem is undecidable for synchronous threshold automata. Still, we show that many synchronous fault-tolerant distributed algorithms have a bounded diameter, although the algorithms are parameterized by the number of processes. Hence, we use bounded model checking for verifying these algorithms.

The existence of bounded diameters is the main conceptual insight in this paper. We compute the diameter of several algorithms and check their safety properties, using SMT queries that contain quantifiers for dealing with the parameters symbolically. Surprisingly, performance of the SMT solvers on these queries is very good, reflecting the recent progress in dealing with quantified queries. We found that the diameter bounds of synchronous algorithms in the literature are tiny (from 1 to 4), which makes our approach applicable in practice. For a specific class of algorithms we also establish a theoretical result on the existence of a diameter, providing a first explanation for our experimental results. The encodings of our benchmarks and instructions on how to run the experiments are available at: [33].

### **1 Introduction**

Fault-tolerant distributed algorithms are hard to design and verify. Recently, threshold automata were introduced to model, verify and synthesize asynchronous fault-tolerant distributed algorithms [19,21,24]. Owing to the wellknown impossibility result [18] many distributed computing problems, including

Partially supported by: Austrian Science Fund (FWF) via NFN RiSE (S11403, S11405), project PRAVDA (P27722), and doctoral college LogiCS W1255; Vienna Science and Technology Fund (WWTF) grant APALACHE (ICT15-103).

```
1 int v:=input({0, 1})
2 bool accept:=false
3 while (true) do { // in one synchronous step
4 if (v = 1) then broadcast <ECHO>;
5 receive messages from other processes;
6 if received <ECHO> from ≥ t + 1 processes
7 then v:=1;
8 if received <ECHO> from ≥ n − t processes
9 then accept:=true;
10 }
                                                     v1 v0
                                                               se
                                                               ac
                                                          r3 : true r2 : φr8 : φ2
                                                            r5 : φ2
                                                                   r1 : φ3
                                                                r6 : true
                                                                     r4 : φ4
```
**Fig. 1.** Pseudo code of synchronous reliable broadcast `a la [32], and its STA, with guards: <sup>φ</sup><sup>1</sup> <sup>≡</sup> #{v1, se, ac} ≥ <sup>t</sup> + 1 <sup>−</sup> <sup>f</sup> and <sup>φ</sup><sup>2</sup> <sup>≡</sup> #{v1, se, ac} ≥ <sup>n</sup> <sup>−</sup> <sup>t</sup> <sup>−</sup> <sup>f</sup> and <sup>φ</sup><sup>3</sup> <sup>≡</sup> #{v1, se, ac} < t + 1 and <sup>φ</sup><sup>4</sup> <sup>≡</sup> #{v1, se, ac} < n <sup>−</sup> <sup>t</sup>.

r<sup>7</sup> : φ<sup>2</sup>

1

consensus, are not solvable in purely asynchronous systems. Thus, synchronous distributed algorithms have been extensively studied [5,26]. In this paper, we introduce *synchronous* threshold automata, and investigate their applicability and limitations for verification of synchronous fault-tolerant distributed algorithms.

An example of such a synchronous threshold automaton is given in Fig. 1 on the right; it encodes the synchronous reliable broadcast algorithm from [32]. (The pseudo code is in Fig. 1 on the left.) Its semantics is defined in terms of a counter system. For each location <sup>i</sup> ∈ {v0, v1, se, ac} (a node in the graph), we have a counter κ<sup>i</sup> that stores the number of processes that are in <sup>i</sup>. The system is parameterized in two ways: (i) in the number of processes n, the number of faults f, and the upper bound on the number of faults t, (ii) the expressions in the guards contain n, t, and f. Every transition moves all processes simultaneously; potentially using a different rule for each process (depicted by an edge in the figure), provided that the rule guards evaluate to true. The guards compare a sum of counters to a linear combination of parameters. For example, the guard <sup>φ</sup><sup>1</sup> <sup>≡</sup> #{v1, se, ac} ≥ <sup>t</sup> + 1 <sup>−</sup> <sup>f</sup> evaluates to true if the number of processes that are either in location v1, se, or ac is greater than or equal to <sup>t</sup> + 1 <sup>−</sup> <sup>f</sup>.

Synchronous Threshold Automata (STA) model synchronous fault-tolerant distributed algorithms as follows. As processes send messages based on their current locations, we use the number of processes in given locations to test how many messages of a certain type have been sent. However, the pseudo code in Fig. 1 is predicated by received messages rather than by sent messages. This algorithm is designed to tolerate Byzantine-faulty processes, which may send spurious messages to some correct processes. Thus, the number of received messages may deviate from the number of correct processes that sent a message. For example, if the guard in line 7 evaluates to true, the t + 1 received messages may contain up to f messages from faulty processes. If i correct processes send <ECHO>, for 1 ≤ i ≤ t, the faulty processes may "help" some correct processes to pass over the t + 1 threshold. In the STA, this is modeled by both the rules r<sup>1</sup> and r<sup>2</sup> being enabled. Thus, the assignment v:=1 in line 7 is modeled by the rule


**Table 1.** A long execution of reliable broadcast and the short representative

r2, guarded by φ2. The implicit "else" branch between lines 7 and 8 is modeled by the rule r1, guarded by φ3. As the effect of the f faulty processes on the correct processes is captured by the guards, we model only the correct processes explicitly, so that a system consists of n − f copies of the STA.

*Contributions.* We start by introducing synchronous threshold automata (STA) and the counter systems they define.


#### **2 Overview of Our Approach**

*Bounded Diameter.* Consider Fig. 1: the processes execute the send, receive, and local computation steps in lock-step. One iteration of the loop is expressed as an STA edge that connects the locations before and after an iteration (i.e., the STA models the loop body of the pseudo code). The location se encodes that v = 1 and accept is false. That is, se is the location in which processes send <ECHO> in every round. If a process sets accept to true, it goes to location ac. The location where v is 1 is encoded by v1, and the where v is 0 by v0.

An example execution is depicted in Table 1 on the left. We run n − f copies of the STA in Fig. 1. Observe that the guards of the rules r<sup>1</sup> and r<sup>2</sup> are both enabled in the configuration σ0. One STA uses r<sup>2</sup> to go to se while the others use the self-loop r<sup>1</sup> to stay in v0. As both rules remain enabled, in every round one more automaton can go to se. Hence, configuration σt+1 has t + 1 correct STA in location se and rule r<sup>1</sup> becomes disabled. Then, all remaining STA go to se and then finally to ac. This execution depends on the parameter t, which implies that the length of this execution is unbounded for increasing values of the parameter t. (We note that we can obtain longer executions, if some STA use rule r4). On the right, we see an execution where all STA take r<sup>2</sup> immediately. That is, while configuration σ<sup>t</sup>+3 is reached by a long execution on the left, it is reached in just two steps on the right (observe σ <sup>2</sup> = σ<sup>t</sup>+3). We are interested in whether there is a natural number k (which does not depend on the parameters n, t and f) such that we can always shorten executions to executions of length ≤ k. (By length, we mean the number of transitions in an execution.) In such a case we say that the STA has *bounded diameter*. In Sect. 5.1 we introduce an SMT-based procedure that enumerates candidates for the diameter bound and checks if the candidate is indeed the diameter; if it finds such a bound, it terminates. For the STA in Fig. 1, this procedure computes the diameter 2.

*Threshold Automata with Traps.* In Sect. 5.2, we define a fragment of STA for which we theoretically guarantee a bounded diameter. For example, the STA in Fig. 1 falls in this fragment, and we obtain a guaranteed diameter of ≤8. The fragment is defined by two conditions: (i) The STA has a structure that implies monotonicity of the guards: the set of locations that are used in the guards (e.g., {v1, se, ac}) is closed under the rules, i.e., from each location within the set, the STA can reach only a location in the set. We call guards that have this property *trapped*. (ii) The STA has no cycles, except possibly self-loops.

*Bounded Model Checking, Completeness and (Un-)Decidability.* The existence of a bounded diameter motivates the use of bounded model checking for verifying safety properties. In Sect. 6 we give an SMT encoding for checking the violation of a safety property by executions with length up to the diameter. Crucially, this approach is complete because if an execution reaches a bad configuration, this bad configuration is already reached by an execution of bounded length. We observe that for the STA defined in this paper (with linear guards and linear constraints on the parameters), the SMT encoding results in a Presburger arithmetic formula (with one quantifier alternation). Hence, checking safety properties (that can be expressed in Presburger arithmetic) is decidable for STA with bounded diameter. We also experimentally demonstrate in Sect. 7 that current SMT solvers can handle these quantified formulae well. On the contrary, we show in Sect. 4 that the parameterized reachability problem is undecidable for general STA. This implies that there are STA with unbounded diameter.

**Fig. 2.** Pseudo code of *FloodMin* from [13], and STA encoding its loop body, for k = 1, with guards: <sup>φ</sup><sup>1</sup> <sup>≡</sup> #{v0, c0} <sup>&</sup>gt; 0 and <sup>φ</sup><sup>2</sup> <sup>≡</sup> #{v0} = 0.

*Threshold Automata with Untrapped Guards.* The *FloodMin* algorithm in Fig. 2 solves the k-set agreement problem. This algorithm is ran by n replicated processes, up to t of which may fail by crashing. For simplicity of presentation, we consider the case when k = 1, which turns k-set agreement into consensus. In Fig. 2, on the right, we have the STA that captures the loop body. The locations c0 and c1 correspond to the case when a process is crashing in the current round and may manage to send the value 0 and 1 respectively; the process remains in the crashed location "✖" and does not send any messages starting with the next round. We observe that the guard #{v0, c0} <sup>&</sup>gt; 0 is not trapped, and our result about trapped guards does not apply. Nevertheless, our SMT-based procedure can find a diameter of 2. In the same way, we automatically found a bound on the diameter for several benchmarks from the literature. It is remarkable that the diameter for the transition relation of the loop body (without the loop condition) is bounded by a constant, independent of the parameters.

*Bounded Model Checking of Algorithms with Clean Rounds.* The number of loop iterations t/k + 1 of the *FloodMin* algorithm has been designed such that it ensures (together with the environment assumption of at most t crashes) that there is at least one *clean* round in which at most k − 1 processes crashed. The correctness of the *FloodMin* algorithm relies on the occurrence of such a clean round. We make use of the existence of clean rounds by employing the following two-step methodology for the verification of safety properties: (i) we find all reachable clean-round configurations, and (ii) check if a bad configuration is reachable from those configurations. Detailed description of this methodology can be found in Sect. 6. Our method requires the encoding of a clean round as input (e.g., for Fig. 2 that no STA are in c0 and c1). We leave detecting and encoding clean rounds automatically from the fault environment for future work.

#### **3 Synchronous Threshold Automata**

We introduce the syntax of synchronous threshold automata and give some intuition of the semantics, which we will formalize as counter systems below.

A *synchronous threshold automaton* is the tuple *STA* = (L, I,Π, R, RC, χ), where L is a finite set of locations, I⊆L is a non-empty set of initial locations, Π is a finite set of parameters, R is a finite set of rules, RC is a resilience condition, and χ is a counter invariant, defined in the following. We assume that the set Π of parameters contains at least the parameter n, denoting the number of processes. We call the vector *π* = π1,...,π|Π<sup>|</sup> the *parameter vector*, and a vector **p** = p1,...,p|Π| is an *instance of <sup>π</sup>*, where <sup>π</sup><sup>i</sup> <sup>∈</sup> <sup>Π</sup> is a parameter, and <sup>p</sup><sup>i</sup> <sup>∈</sup> <sup>N</sup> is a natural number, for 1 ≤ i ≤ |Π|, such that **p**[πi] = p<sup>i</sup> is the value assigned to the parameter π<sup>i</sup> in the instance **p** of *π*. The set of *admissible instances of π* is defined as <sup>P</sup>RC <sup>=</sup> {**<sup>p</sup>** <sup>∈</sup> <sup>N</sup>|Π<sup>|</sup> <sup>|</sup> **<sup>p</sup>** is an instance of *<sup>π</sup>* and **<sup>p</sup>** satisfies RC}. The mapping <sup>N</sup> : <sup>P</sup>RC <sup>→</sup> <sup>N</sup> maps an admissible instance **<sup>p</sup>** <sup>∈</sup> <sup>P</sup>RC to the number N(**p**) of processes that participate in the algorithm, such that N(**p**) is a linear combination of the parameter values in **p**.

For example, for the STA in Fig. 1, RC ≡ n > 3t ∧ t ≥ f, hence a vector **<sup>p</sup>** <sup>∈</sup> <sup>N</sup>|Π<sup>|</sup> is an admissible instance of the parameter vector *<sup>π</sup>* <sup>=</sup> n, t, f , if **p**[n] > 3**p**[t] ∧ **p**[t] ≥ **p**[f]. Furthermore, for this STA, N(**p**) = **p**[n] − **p**[f]. For the STA in Fig. 2, RC ≡ n>t ∧ t ≥ f, hence the admissible instances satisfy **p**[n] > **p**[t] ∧ **p**[t] ≥ **p**[f], and we have N(**p**) = **p**[n].

We introduce *counter atoms* of the form ψ ≡ #L ≥ *a* · *π* + b, where L ⊆ L is a set of locations, #L denotes the total number of processes currently in the locations - <sup>∈</sup> <sup>L</sup>, *<sup>a</sup>* <sup>∈</sup> <sup>Z</sup>|Π<sup>|</sup> is a vector of coefficients, *<sup>π</sup>* is the parameter vector, and <sup>b</sup> <sup>∈</sup> <sup>Z</sup>. We will use the counter atoms for expressing guards and predicates in the verification problem. In the following, we will use two abbreviations: #L = *a*·*π*+b for the formula (#L ≥ *a*·*π*+b)∧¬(#L ≥ *a*·*π*+b+1), and #L > *a*·*π*+b for the formula #L ≥ *a* · *π* + b + 1.

A *rule* r ∈ R is the tuple (*from*,*to*, ϕ), where *from*,*to* ∈ L are locations, and ϕ is a guard whose truth value determines if the rule r is executed. The guard ϕ is a Boolean combination of counter atoms. We denote by Ψ the set of counter atoms occurring in the guards of the rules r ∈ R.

The *counter invariant* χ is a Boolean combination of counter atoms #L ≥ *a* · *π* + b, where each atom occurring in χ restricts the number of processes allowed to populate the locations in L ⊆ L.

*Counter Systems.* The counter atoms are evaluated over tuples (κ, **p**), where <sup>κ</sup> <sup>∈</sup> <sup>N</sup>|L| is a vector of *counters*, and **<sup>p</sup>** <sup>∈</sup> <sup>P</sup>RC is an admissible instance of *<sup>π</sup>*. For a location - ∈ L, the counter κ[-] denotes the number of processes that are currently in the location -. A counter atom ψ ≡ #L ≥ *a* · *π* + b is *satisfied* in the tuple (κ, **p**), that is (κ, **p**) |= ψ, iff - ∈<sup>L</sup> <sup>κ</sup>[-] ≥ *a* · **p** + b. The semantics of the Boolean connectives is standard.

<sup>A</sup> *transition* is a function <sup>t</sup> : R → <sup>N</sup> that maps a rule <sup>r</sup> ∈ R to a factor <sup>t</sup>(r) <sup>∈</sup> <sup>N</sup>, denoting the number of processes that act upon this rule. Given an instance **p** of *π*, we denote by T(**p**) the set {t | - <sup>r</sup>∈R <sup>t</sup>(r) = <sup>N</sup>(**p**)} of transitions whose rule factors sum up to N(**p**).

Given a tuple (κ, **p**) and a transition t, we say that t is *enabled* in (κ, **p**), if


The first condition ensures that processes only use rules whose guards are satisfied, and the second that every process moves in an enabled transition.

Observe that each transition t ∈ T(**p**) defines a unique tuple (κ, **p**) in which it is enabled. We call the *origin* of a transition t ∈ T(**p**) the tuple o(t)=(κ, **p**), such that for every - ∈ L, we have o(t).κ[-] = - <sup>r</sup>∈R∧r.*from*= <sup>t</sup>(r). Similarly, each transition defines a unique tuple (κ, **p**) that is the result of applying the transition in its origin. We call the *goal* of a transition t ∈ T(**p**) the tuple g(t)=(κ, **p**), such that for every - ∈ L, we have g(t).κ[-] = - <sup>r</sup>∈R∧r.*to*= <sup>t</sup>(r).

We now define a counter system, for a given *STA* = (L, I,Π, R, RC, χ), and an admissible instance **p** ∈ PRC of the parameter vector *π*.

**Definition 1.** *A* counter system *w.r.t. STA* = (L, I,Π, R, RC, χ) *and an admissible instance* **p** ∈ PRC *is the tuple* CS(*STA*, **p**)=(Σ(**p**), I(**p**), R(**p**))*, where*


We restrict ourselves to deadlock-free counter systems, i.e., counter systems where the transition relation is total (every configuration has a successor). A sufficient condition for deadlock-freedom is that for every location - ∈ L, it holds that χ → <sup>r</sup>∈R∧r.*from*= r.ϕ. This ensures that it is always possible to move out of every location, as there is at least one outgoing rule per location whose guard is satisfied.

To simplify the notation, in the following we write σ[-] to denote σ.κ[-].

*Paths and Schedules in a Counter System.* We now define paths and schedules of a counter system, as sequences of configurations and transitions, respectively.

**Definition 2.** *A* path *in the counter system* CS(*STA*, **p**)=(Σ(**p**), I(**p**), R(**p**)) *is a finite sequence* {σ<sup>i</sup>}<sup>k</sup> <sup>i</sup>=0 *of configurations, such that for every two consecutive configurations* σ<sup>i</sup>−<sup>1</sup>, σi*, for* 0 < i ≤ k*, there exists a transition* t<sup>i</sup> ∈ T(**p**) *such that* σ<sup>i</sup>−<sup>1</sup> <sup>t</sup>*<sup>i</sup>* −→ <sup>σ</sup>i*. A path* {σ<sup>i</sup>}<sup>k</sup> <sup>i</sup>=0 *is called an* execution *if* σ<sup>0</sup> ∈ I(**p**)*.*

**Definition 3.** *<sup>A</sup>* schedule *is a finite sequence* <sup>τ</sup> <sup>=</sup> {t<sup>i</sup>}<sup>k</sup> <sup>i</sup>=1 *of transitions* t<sup>i</sup> ∈ T(**p**)*, for* 0 < i ≤ k*. We denote by* |τ | = k *the length of the schedule* τ *.*

*A schedule* <sup>τ</sup> <sup>=</sup> {t<sup>i</sup>}<sup>k</sup> <sup>i</sup>=1 *is* feasible *if there is a path* {σ<sup>i</sup>}<sup>k</sup> <sup>i</sup>=0 *such that* σ<sup>i</sup>−<sup>1</sup> t*i* −→ σi*, for* 0 < i ≤ k*. We call* σ<sup>0</sup> *the* origin*, and* σ<sup>k</sup> *the* goal *of* τ *, and write* σ<sup>0</sup> τ −→ σk*.*

### **4 Parameterized Reachability and Its Undecidability**

We show that the following problem is undecidable in general, by reduction from the halting problem of a two-counter machine (2CM) [28]. Such reductions are common in parameterized verification, e.g., see [12].

**Definition 4 (Parameterized Reachability).** *Given a formula* ϕ*, that is, a Boolean combination of counter atoms, and STA* = (L, I,Π, R, RC, χ)*, the* parameterized reachability *problem is to decide whether there exists an admissible instance* **p** ∈ PRC *, such that in the counter system* CS(*STA*, **p**)*, there is an initial configuration* <sup>σ</sup> <sup>∈</sup> <sup>I</sup>(**p**)*, and a feasible schedule* <sup>τ</sup> *, with* <sup>σ</sup> <sup>τ</sup> −→ σ *and* σ |= ϕ*.*

To prove undecidability, we construct a synchronous threshold automaton *STA*M, such that every counter system induced by it simulates the steps of a 2CM executing a program P. The STA has a single parameter – the number n of processes, and the invariant χ = *true*. The idea is that each process plays one of two roles: either it is used to encode the control flow of the program P (*controller* role), or to encode the values of the registers in unary, as in [17] (*storage* role). Thus, *STA*<sup>M</sup> consists of two parts – one per each role.

Our construction allows multiple processes to act as controllers. Since we assume that 2CM is deterministic, all the controllers behave the same. For each instruction of the program P, in the controller part of *STA*M, there is a single location (for 'jump if zero' and 'halt') or a pair of locations (for 'increment' and 'decrement'), and a special *stuck* location. In the storage part of *STA*M, there is a location for each register, a store location, and auxiliary locations. The number of processes in a register location encodes the value of the register in 2CM.

An increment (resp. decrement) of a register is modeled by moving one process from (resp. to) the store location to (resp. from) the register location. The guards on the rules in the controller part check if the storage processes made a transition that truly models a step of 2CM; in this case, the controllers move on to the next location, otherwise they move to the stuck location. For example, to model a 'jump if zero' for register A, the controllers check if #{-<sup>A</sup>} = 0, where -<sup>A</sup> is the storage location corresponding to register A. The main invariant which ensures correctness is that every transition in every counter system induced by *STA*<sup>M</sup> either faithfully simulates a step of the 2CM, or moves all of the controllers to the stuck location.

Let *halt* be the halting location in the controller part of *STA*M. The formula ϕ ≡ ¬(#{*halt*} = 0) states that the controllers have reached the halting location. Thus, the answer to the parameterized reachability question given the formula ϕ and *STA*<sup>M</sup> is positive iff 2CM halts, which gives us undecidability.

### **5 Bounded Diameter Oracle**

#### **5.1 Computing the Diameter Using SMT**

Given an STA, the diameter is the maximal number of transitions needed to reach all possible configurations in every counter system induced by the STA, and an admissible instance **p** ∈ PRC . We adapt the definition of diameter from [11].

**Definition 5 (Diameter).** *Given an STA* = (L, I,Π, R, RC, χ)*, the* diameter *is the smallest number* <sup>d</sup> *such that for every* **<sup>p</sup>** <sup>∈</sup> <sup>P</sup>RC *and every path* {σ<sup>i</sup>}<sup>d</sup>+1 i=0

*of length* d + 1 *in* CS(*STA*, **p**)*, there exists a path* {σ j}e <sup>j</sup>=0 *of length* e ≤ d *in* CS(*STA*, **p**)*, such that* σ<sup>0</sup> = σ <sup>0</sup> *and* σd+1 = σ e*.*

Thus, the diameter is the smallest number d that satisfies the formula:

$$\forall \mathbf{p} \in P\_{RC}. \forall \sigma\_0, \dots, \sigma\_{d+1}. \forall t\_1, \dots, t\_{d+1}. \exists \sigma'\_0, \dots, \sigma'\_d. \exists t'\_1, \dots, t'\_d.$$

$$Path(\sigma\_0, \sigma\_{d+1}, d+1) \to (\sigma\_0 = \sigma'\_0) \land Path(\sigma'\_0, \sigma'\_d, d) \land \bigvee\_{i=0}^d \sigma'\_i = \sigma\_{d+1} \tag{1}$$

where P ath(σ0, σd, d) is a shorthand for the formula <sup>d</sup>−<sup>1</sup> <sup>i</sup>=0 R(σi, t<sup>i</sup>+1, σ<sup>i</sup>+1), and R(σ, t, σ ) is a predicate which evaluates to true whenever <sup>σ</sup> <sup>t</sup> −→ σ . Since we assume deadlock-freedom, we are able to encode the path *Path*(σ 0, σ <sup>d</sup>, d) of length <sup>d</sup>, even if the disjunction <sup>d</sup> <sup>i</sup>=0 σ <sup>i</sup> = σ<sup>d</sup>+1 holds for some i ≤ d.

Formula (1) gives us the following procedure to determine the diameter:


If the procedure terminates, it outputs the diameter, which can be used as completeness threshold for bounded model checking. We implemented this procedure, and used a back-end SMT solver to automate the test in step 2.

#### **5.2 Bounded Diameter for a Fragment of STA**

In this section, we show that for a specific fragment of STA, we are able to give a theoretical bound on the diameter, similar to the asynchronous case [20,21].

The STA that fall in this fragment are *monotonic* and *1-cyclic*. An STA is monotonic iff every counter atom changes its truth value at most once in every path of a counter system induced by the STA and an admissible instance **p** ∈ PRC . This implies that every schedule can be partitioned into finitely many sub-schedules, that satisfy a property we call *steadiness*. We call a schedule *steady* if the set of rules whose guards are satisfied does not change in all of its transitions. We also give a sufficient condition for monotonicity, using *trapped* counter atoms, defined below. In a 1-cyclic STA, the only cycles that can be formed by its rules are self-loops. Under these two conditions, we guarantee that for every steady schedule, there exists a steady schedule of bounded length, that has the same origin and goal. We show that this bound depends on the counter atoms Ψ occurring in the guards of the STA, and the length of the longest path in the STA, denoted by c. The main result of this section is stated by the theorem:

**Theorem 1.** *For every feasible schedule* τ *in a counter system* CS(*STA*, **p**)*, where STA is monotonic and 1-cyclic, and* **p** ∈ PRC *, there exists a feasible schedule* τ *of length* O(|Ψ|c)*, such that* τ *and* τ *have the same origin and goal.*

To prove Theorem 1, we start by defining monotonic STA.

**Definition 6 (Monotonic STA).** *An automaton STA* = (L, I,Π, R, RC, χ) *is monotonic iff for every path* {σi}<sup>k</sup> <sup>i</sup>=0 *in the counter system* CS(*STA*, **p**)*, for* **p** ∈ PRC *, and every counter atom* ψ ∈ Ψ*, we have* σ<sup>i</sup> |= ψ *implies* σ<sup>j</sup> |= ψ*, for* 0 ≤ i<j<k*.*

To show that we can partition a schedule into finitely many sub-schedules, we need the notion of a context. A *context* of a transition t ∈ T(**p**) is the set C<sup>t</sup> = {ψ ∈ Ψ | o(t) |= ψ} of counter atoms ψ satisfied in the origin o(t) of the transition t. Given a feasible schedule τ , the point i is a *context switch*, if C<sup>t</sup>*i−*<sup>1</sup> = C<sup>t</sup>*<sup>i</sup>* , for 1 < i ≤ |τ |.

**Lemma 1.** *Every feasible schedule* τ *in a counter system induced by a monotonic STA has at most* |Ψ| *context switches.*

*Proof.* Let <sup>τ</sup> <sup>=</sup> {t<sup>i</sup>}<sup>k</sup> <sup>i</sup>=1 be a feasible schedule and Ψ the set of counter atoms appearing on the rules of the monotonic STA. For every ψ ∈ Ψ, there is at most one context switch i, for 0 < i ≤ k, such that ψ ∈ C<sup>t</sup>*i−*<sup>1</sup> and ψ ∈ C<sup>t</sup>*<sup>i</sup>* .

*Sufficient Condition for Monotonicity.* We introduce trapped counter atoms.

**Definition 7.** *A set* L ⊆ L *of locations is called a* trap*, iff for every* - ∈ L *and every* r ∈ R *such that* -= r.from*, it holds that* r.to ∈ L*.*

*A counter atom* ψ ≡ #L ≥ *a* · *π* + b *is* trapped *iff the set* L *is a trap.*

**Lemma 2.** *Let* ψ ≡ #L ≥ *a*·*π*+b *be a trapped counter atom,* σ *a configuration such that* <sup>σ</sup> <sup>|</sup><sup>=</sup> <sup>ψ</sup>*, and* <sup>t</sup> *a transition enabled in* <sup>σ</sup>*. If* <sup>σ</sup> <sup>t</sup> −→ σ *, then* σ |= ψ*.*

**Corollary 1.** *Let STA* = (L, I,Π, R, RC, χ) *be an automaton such that all its counter atoms are trapped. Then STA is monotonic.*

*Steady Schedules.* We define the notion of steadiness, similarly to [20].

**Definition 8.** *A schedule* <sup>τ</sup> <sup>=</sup> {t<sup>i</sup>}<sup>k</sup> <sup>i</sup>=1 *is* steady*, if* C<sup>t</sup>*<sup>i</sup>* = C<sup>t</sup>*<sup>j</sup> , for* 0 <i<j ≤ k*.*

We now focus on shortening steady schedules. That is, given a steady schedule, we construct a schedule of bounded length with the same origin and goal.

Observe that *STA* = (L, I,Π, R, RC, χ) can be seen as a directed graph G*STA*, with vertices corresponding to the locations - ∈ L, and edges corresponding to the rules r ∈ R. We denote by c the length of the longest path between two nodes in the graph G*STA*, and call it the *longest chain* of *STA*. If G*STA* contains only cycles of length one, then *STA* is called *1-cyclic*.

To shorten steady schedules, in addition to monotonicity, we require that the STA are also 1-cyclic. In the following, we assume that the schedules we shorten come from counter systems induced by monotonic and 1-cyclic STA. Intuitively, if a given schedule is longer than the longest chain of the STA, then in some transition of the schedule some processes followed a rule which is a self-loop. As processes may follow self-loops at different transitions, we cannot shorten the given schedule by eliminating transitions as a whole. Instead, we deconstruct the original schedule into sequences of process steps, which we call *runs*, shorten the runs, and reconstruct a new shorter schedule from the shortened runs. The main challenge is to show that the newly obtained schedule is feasible and steady.

*Schedules as Multisets of Runs.* We proceed by defining runs and showing that each schedule can be represented by a multiset of runs.

We call a *run* the sequence <sup>=</sup> {ri}<sup>k</sup> <sup>i</sup>=1 of rules, for r<sup>i</sup> ∈ R, such that ri.*to* = ri+1.*from*, for 0 <i<k. We denote by [i] = r<sup>i</sup> the i-th rule in the run , and by || the length of the run. The following lemma shows that a feasible schedule can be deconstructed into a multiset of runs.

**Lemma 3.** *For every feasible schedule* <sup>τ</sup> <sup>=</sup> {ti}<sup>k</sup> <sup>i</sup>=1*, there exists a multiset* (P, m)*, where*


A multiset (P, m) of runs of length <sup>k</sup> defines a schedule <sup>τ</sup> <sup>=</sup> {t<sup>i</sup>}<sup>k</sup> <sup>i</sup>=1 of length k, and we have ti(r) = - [i]=<sup>r</sup> m(), for every rule r ∈ R and 0 < i ≤ k.

For the counter systems of STA, which are both monotonic and 1-cyclic, we show that their steady schedules can be shortened, so that their length does not exceed the longest chain c (that is, the length of the longest path in the STA).

**Lemma 4.** *Let* τ *be a steady feasible schedule in a counter system induced by a monotonic and 1-cyclic STA. If* |τ | > c + 1*, then there exists a steady feasible schedule* τ *such such that* |τ | = |τ | −1*, and* τ,τ *have the same origin and goal.*

*Proof (Sketch).* If <sup>τ</sup> <sup>=</sup> {t<sup>i</sup>}<sup>k</sup>+1 <sup>i</sup>=1 , with |τ | = k + 1 > c + 1, is a steady schedule, then <sup>C</sup><sup>t</sup><sup>1</sup> <sup>=</sup> <sup>C</sup><sup>t</sup>*<sup>k</sup>* , and its prefix <sup>θ</sup> <sup>=</sup> {t<sup>i</sup>}<sup>k</sup> <sup>i</sup>=1 is a steady and feasible schedule, with k>c. By Lemma 3, there is a multiset (P, m) of runs of length k describing θ. Since k>c, and c is the longest chain in the STA, which is 1-cyclic, it must be the case that every run in P contains at least one self-loop. Construct a new multiset (P , m ) of runs of length k − 1, such that each ∈ P is obtained by some ∈ P by removing one occurrence of a self-loop rule. The multiset (P , m ) defines the schedule θ = {t 1}<sup>k</sup>−<sup>1</sup> <sup>i</sup>=1 . Because of the monotonicity and steadiness of θ, and because we only remove self-loops (which go from and to the same location) when we build θ from θ, the feasibility is preserved, that is, it holds that g(t <sup>i</sup>−<sup>1</sup>) = o(t <sup>i</sup>), for 1 <i<k, and that no guards false in θ become true in θ . Furthermore, it is easy to check that θ has the same origin and goal as θ. As the goal of θ is the origin of t<sup>k</sup>+1, construct a schedule τ = {t i}k <sup>i</sup>=1, where t <sup>k</sup> = t<sup>k</sup>+1. As τ is steady, the transitions t<sup>1</sup> and t<sup>k</sup>+1 have the same contexts. From o(t1) = o(t <sup>1</sup>) and o(t<sup>k</sup>+1) = o(t <sup>k</sup>), we get that t <sup>1</sup> and t <sup>k</sup> have the same contexts, which, together with the monotonicity, implies that τ is steady.

As a consequence of Lemmas 1 and 4, we obtain Theorem 1, which tells us that for any feasible schedule, there exists a feasible schedule of length O(|Ψ|c). This bound does not depend on the parameters, but on the number of context switches and the longest chain c, which are properties of the STA.

### **6 Bounded Model Checking of Safety Properties**

Once we obtain the diameter bound d (either using the procedure from Sect. 5.1, or by Theorem 1), we use it as a completeness threshold for bounded model checking. For the algorithms that we verify, we express the violations of their safety properties as reachability queries on bounded executions. The length of the bounded executions depends on d, and on whether the algorithm was designed such that it is assumed that there is a clean round in every execution.

*Checking Safety for Algorithms that do not Assume a Clean Round.* Here, we search for violations of safety properties in executions of length e ≤ d, by checking satisfiability of the formula:

$$\exists \mathbf{p} \in P\_{RC}. \exists \sigma\_0, \dots, \sigma\_e. \exists t\_1, \dots, t\_e. \operatorname{Jit}(\sigma\_0) \land \operatorname{Path}(\sigma\_0, \sigma\_e, e) \land \operatorname{Bad}(\sigma\_e) \tag{2}$$

where the predicate *Init*(σ) encodes that σ is an initial configuration, together with the constraints imposed on the initial configuration by the safety property, and *Bad*(σ) encodes the bad configuration, which, if reachable, violates safety.

For example, the algorithm in Fig. 1 has to satisfy the safety property *unforgeability*: If no process sets v to 1 initially, then no process ever sets accept to true. In our encoding, we check executions of length e ≤ d, whose initial configuration has the counter κ[v1] = 0. In a bad configuration, the counter κ[ac] > 0. Thus, to find violations of unforgeability, in formula (2), we set:

$$\begin{aligned} Int(\sigma\_0) &\equiv \sigma\_0[\mathbf{v}0] + \sigma\_0[\mathbf{v}1] = N(\mathbf{p}) \wedge \sigma\_0[\mathbf{v}1] = 0\\ Bad(\sigma\_e) &\equiv \sigma\_e[\mathbf{AC}] > 0 \end{aligned}$$

*Checking Safety for Algorithms with a Clean Round.* We check for violations of safety in executions of length e ≤ 2d, where e = e<sup>1</sup> + e<sup>2</sup> such that: (i) we find all reachable clean-round configurations in an execution of length e1, for e<sup>1</sup> ≤ d, such that the last configuration σ<sup>e</sup><sup>1</sup> satisfies the clean round condition, and (ii) we check if a bad configuration is reachable from σ<sup>e</sup><sup>1</sup> by a path of length e<sup>2</sup> ≤ d. That is, we check satisfiability of the formula:

$$\begin{aligned} \exists \mathbf{p} \in P\_{RC}. \exists \sigma\_0, \dots, \sigma\_{e^\*}. \exists t\_1, \dots, t\_e. \operatorname{Init}(\sigma\_0) \land \operatorname{Path}(\sigma\_0, \sigma\_{e\_1}, e\_1) \\ \land \operatorname{Clean}(\sigma\_{e\_1}) \land \operatorname{Path}(\sigma\_{e\_1}, \sigma\_e, e\_2) \land \operatorname{Bad}(\sigma\_e) \end{aligned}$$

where the predicate *Clean*(σ) encodes the clean round condition.

For example, one of the safety properties that the *FloodMin* algorithm for k = 1 (Fig. 2) has to satisfy, is *k-agreement*, which requires that at most k different values are decided. In the original algorithm, the processes decide after t/k + 1 rounds, such that at least one of them is the clean round, in which at most k − 1 processes crash. In our encoding, we check paths of length e ≤ 2d. We enforce the clean round condition by asserting that the sum of counters of the locations c0, c1 are <sup>k</sup> <sup>−</sup> 1 = 0 in the configuration <sup>σ</sup><sup>e</sup><sup>1</sup> . The property 1-agreement is violated if in the last configuration both the counters κ[v0] and κ[v1] are non-zero. That is, to check 1-agreement, in formula (3) we set:

$$\begin{aligned} Init(\sigma\_0) &\equiv \sigma\_0[\mathbf{v}0] + \sigma\_0[\mathbf{v}1] + \sigma\_0[\mathbf{c}0] + \sigma\_0[\mathbf{c}1] = N(\mathbf{p})\\ Clearan(\sigma\_{e\_1}) &\equiv \sigma\_{e\_1}[\mathbf{c}0] + \sigma\_{e\_1}[\mathbf{c}1] = 0\\Bad(\sigma\_e) &\equiv \sigma\_e[\mathbf{v}0] > 0 \land \sigma\_e[\mathbf{v}1] > 0 \end{aligned}$$

### **7 Experimental Evaluation**

The algorithms that we model using STA and verify by bounded model checking are designed for different fault models, which in our case are crashes, send omissions or Byzantine faults. We now proceed by introducing our benchmarks. Their encodings, together with the implementations of the procedures for finding the diameter and applying bounded model checking are available at [1].

*Algorithms without a Clean Round Assumption.* We consider three variants of the synchronous reliable broadcast algorithm, whose STA are monotonic and 1 cyclic (i.e., Theorem 1 applies). These algorithms assume different fault models:


*Algorithms with a Clean Round.* We encode several algorithms from this class, that solve the consensus or k-set agreement problem:


These algorithms have a structure similar to the one depicted in Fig. 2, with the exception of phase king and phase queen. Their loop body consists of several message exchange steps, which correspond to multiple rounds, grouped in a *phase*. In each phase, a designated process acts as a coordinator.

*Computing the Diameter.* We implemented the procedure from Sect. 5.1 in Python. The implementation uses a back-end SMT solver (currently, z3 and cvc4). Our tool computed diameter bounds for all of our benchmarks, even for those for which we do not have a theoretical guarantee. Our experiments reveal extremely low values for the diameter, that range between 1 and 4. The values for the diameter and the time needed to compute them are presented in Table 2.

**Table 2.** Results for our benchmarks, available at [1]: |L|, |R|, <sup>|</sup>Ψ|, RC are the number of locations, rules, atomic guards, and resilience condition in each STA; d is the diameter computed using SMT, c is the longest chain of the algorithms whose STA are monotonic and 1-cyclic; τ is the time (in seconds) to compute the diameter using SMT; T, *SMT* is the time to check reachability using the diameter computed using the SMT procedure from Sect. 5.1; T, *Theorem* 1 the time to check reachability using the bound obtained by Theorem 1. For the cases where Theorem 1 is not applicable, we write (–). The experiments were run on a machine with Intel(R) Core(TM) i5-4210U CPU and 4GB of RAM, using z3-4.8.1 and cvc4-1.6.


*Checking the Algorithms.* We have implemented another Python function which encodes violations of the safety properties as reachability properties on paths of bounded length, as described in Sect. 6, and uses a back-end SMT solver to check their satisfiability. Table 2 contains the results that we obtained by checking reachability for our benchmarks, using the diameter bound computed using the procedure from Sect. 5.1, and diameter bound from Theorem 1, for algorithms whose STA are monotonic and 1-cyclic.

To our knowledge, we are the first to verify the listed algorithms that work with send omission, Byzantine and hybrid faults. For the algorithms with crash faults, our approach is a significant improvement to the results obtained using the abstraction-based method from [3].

*Counterexamples.* Our tool found a bug in the version of the phase king algorithm that was given in [8], which was corrected in the version of the algorithm in [9]. The version from [8] had the wrong threshold '> n − t' in one guard, while the one in [9] had '≥ n − t' for the same guard. To test our tool, we produced erroneous encodings for our benchmarks, and checked them. For rb, rb hybrid, rb omit, phase king, and phase queen, we tweaked the resilience condition, and introduced more faults than expected by the algorithm, e.g., by setting f>t (instead of f ≤ t) in the STA in Fig. 1. For fair cons, floodmin, floodset, and kset omit, we checked executions without a clean round. For all of the erroneous encodings, our tool produces counterexamples in seconds.

#### **8 Discussion and Related Work**

Parameterized verification of synchronous and partially synchronous distributed algorithms has recently gained attention. Both models have in common that distributed computations are organized in rounds and processes (conceptually) move in lock-step. For partially synchronous consensus algorithms, the authors of [15] introduced a consensus logic and (semi-)decision procedures. Later, the authors of [27] introduced a language for partially synchronous consensus algorithms, and proved cut-off theorems specialized to the properties of consensus: agreement, validity, and termination. Concerning synchronous algorithms, the authors of [3] introduced an abstraction-based model checking technique for crash-tolerant synchronous algorithms with existential guards. In contrast to their work, we allow more general guards that contain linear expressions over the parameters, e.g., n − t. Our method offers more automation, and our experimental evaluation shows that our technique is faster than the technique [3].

We introduce a *synchronous* variant of threshold automata, which were proposed in [21] for asynchronous algorithms. Several extensions of this model were recently studied in [23], but the synchronous case was not considered. STA extend the guarded protocols by [16], in which a process can check only if a sum of counters is different from 0 or n. Generalizing the results from [16] to STA is not straightforward. In [2], safety of finite-state transition systems over infinite data domains was reduced to backwards reachability checking using a fixpoint computation, as long as the transition systems are well-structured. It would be interesting to put our results in this context. A decidability result for liveness properties of parameterized timed networks was obtained in [4], employing linear programming for the analysis of vector addition systems with a parametric initial state. We plan to investigate the use of similar ideas for analyzing liveness properties of STA.

The 1-cyclicity condition is reminiscent of flat counter automata [25]. In Fig. 3, we show a possible translation of an STA to a counter automaton (similar to the translation for asynchronous threshold automata from [23]). We note

**Fig. 3.** A counter automaton for the STA in Fig. 1, with <sup>φ</sup><sup>0</sup> <sup>≡</sup> x<t + 1, <sup>φ</sup><sup>1</sup> <sup>≡</sup> <sup>x</sup> <sup>+</sup> <sup>f</sup> <sup>≥</sup> <sup>t</sup> + 1, <sup>φ</sup><sup>2</sup> <sup>≡</sup> <sup>x</sup> <sup>+</sup> <sup>f</sup> <sup>≥</sup> <sup>n</sup> <sup>−</sup> <sup>t</sup>, <sup>φ</sup><sup>3</sup> <sup>≡</sup> x<n <sup>−</sup> <sup>t</sup>, where <sup>x</sup> counts the number of processes in locations v1, se, ac; and n, t, f are counters for the parameters. On a path from s<sup>0</sup> to <sup>s</sup>7, the counters ∈ {v0, v1, se, ac} are emptied, while the counters n are populated. This models the transitions from one location to another in the current round.

that the counter automaton is not flat, due to the presence of the outer loop, which models a transition to the next round. By knowing a bound d on the diameter (e.g., by Theorem 1), one can flatten the counter automaton by unfolding the outer loop d times. We also experimented with FAST [6] on two of our benchmarks: rb and floodmin for k = 1, depicted in Figs. 1 and 2 respectively. FAST terminated on rb, but took significantly longer than our tool on the same machine (i.e., hours rather than seconds). FAST ran out of memory when checking floodmin.

Our experiments show that STA that are neither monotonic, nor 1-cyclic still may have bounded diameters. Finding other classes of STA for which one could derive the diameter bounds is a subject of future work. Although we considered only reachability properties in this work—which happened to be challenging—we are going to investigate completeness thresholds for liveness in the future.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **Measuring Masking Fault-Tolerance**

Pablo F. Castro1,3(B) , Pedro R. D'Argenio2,3,4 , Ramiro Demasi2,3 , and Luciano Putruele1,3

<sup>1</sup> Departamento de Computaci´on, FCEFQyN, Universidad Nacional de R´ıo Cuarto, R´ıo Cuarto, C´ordoba, Argentina *{*pcastro,lputruele*}*@dc.exa.unrc.edu.ar <sup>2</sup> FaMAF, Universidad Nacional de C´ordoba, C´ordoba, Argentina *{*dargenio,rdemasi*}*@famaf.unc.edu.ar <sup>3</sup> Consejo Nacional de Investigaciones Cient´ıficas y T´ecnicas (CONICET), Buenos Aires, Argentina <sup>4</sup> Saarland University, Saarbr¨ucken, Germany

**Abstract.** In this paper we introduce a notion of fault-tolerance distance between labeled transition systems. Intuitively, this notion of distance measures the degree of fault-tolerance exhibited by a candidate system. In practice, there are different kinds of fault-tolerance, here we restrict ourselves to the analysis of masking fault-tolerance because it is often a highly desirable goal for critical systems. Roughly speaking, a system is masking fault-tolerant when it is able to completely mask the faults, not allowing these faults to have any observable consequences for the users. We capture masking fault-tolerance via a simulation relation, which is accompanied by a corresponding game characterization. We enrich the resulting games with quantitative objectives to define the notion of masking fault-tolerance distance. Furthermore, we investigate the basic properties of this notion of masking distance, and we prove that it is a directed semimetric. We have implemented our approach in a prototype tool that automatically computes the masking distance between a nominal system and a fault-tolerant version of it. We have used this tool to measure the masking tolerance of multiple instances of several case studies.

### **1 Introduction**

Fault-tolerance allows for the construction of systems that are able to overcome the occurrence of faults during their execution. Examples of fault-tolerant systems can be found everywhere: communication protocols, hardware circuits, avionic systems, cryptographic currencies, etc. So, the increasing relevance of critical software in everyday life has led to a renewed interest in the automatic

This work was supported by grants ANPCyT PICT-2017-3894 (RAFTSys), ANPCyT PICT 2016-1384, SeCyT-UNC 33620180100354CB (ARES), and the ERC Advanced Grant 695614 (POWVER).

verification of fault-tolerant properties. However, one of the main difficulties when reasoning about these kinds of properties is given by their quantitative nature, which is true even for non-probabilistic systems. A simple example is given by the introduction of redundancy in critical systems. This is, by far, one of the most used techniques in fault-tolerance. In practice, it is well-known that adding more redundancy to a system increases its reliability. Measuring this increment is a central issue for evaluating fault-tolerant software, protocols, etc. On the other hand, the formal characterization of fault-tolerant properties could be an involving task, usually these properties are encoded using *ad-hoc* mechanisms as part of a general design.

The usual flow for the design and verification of fault-tolerant systems consists in defining a nominal model (i.e., the "fault-free" or "ideal" program) and afterwards extending it with faulty behaviors that deviate from the normal behavior prescribed by the nominal model. This extended model represents the way in which the system operates under the occurrence of faults. There are different ways of extending the nominal model, the typical approach is *fault injection* [20,21], that is, the automatic introduction of faults into the model. An important property that any extended model has to satisfy is the preservation of the normal behavior under the absence of faults. In [11], we proposed an alternative formal approach for dealing with the analysis of fault-tolerance. This approach allows for a fully automated analysis and appropriately distinguishes faulty behaviors from normal ones. Moreover, this framework is amenable to fault-injection. In that work, three notions of simulation relations are defined to characterize *masking*, *nonmasking*, and *failsafe* fault-tolerance, as originally defined in [15].

During the last decade, significant progress has been made towards defining suitable metrics or distances for diverse types of quantitative models including real-time systems [19], probabilistic models [12], and metrics for linear and branching systems [6,8,18,23,29]. Some authors have already pointed out that these metrics can be useful to reason about the robustness of a system, a notion related to fault-tolerance. Particularly, in [6] the traditional notion of simulation relation is generalized and three different simulation distances between systems are introduced, namely *correctness*, *coverage*, and *robustness*. These are defined using quantitative games with *discounted-sum* and *mean-payoff* objectives.

In this paper we introduce a notion of fault-tolerance distance between labelled transition systems. Intuitively, this distance measures the degree of fault-tolerance exhibited by a candidate system. As it was mentioned above, there exist different levels of fault-tolerance, we restrict ourselves to the analysis of *masking faulttolerance* because it is often classified as the most benign kind of fault-tolerance and it is a highly desirable property for critical systems. Roughly speaking, a system is masking fault-tolerant when it is able to completely mask the faults, not allowing these faults to have any observable consequences for the users. Formally, the system must preserve both the safety and liveness properties of the nominal model [15]. In contrast to the robustness distance defined in [6], which measures how many unexpected errors are tolerated by the implementation, we consider a specific collection of faults given in the implementation and measure how many faults are tolerated by the implementation in such a way that they can be masked by the states. We also require that the normal behavior of the specification has to be preserved by the implementation when no faults are present. In this case, we have a bisimulation between the specification and the non-faulty behavior of the implementation. Otherwise, the distance is 1. That is, δm(N, I) = 1 if and only if the nominal model <sup>N</sup> and <sup>I</sup>\<sup>F</sup> are not bisimilar, where <sup>I</sup>\<sup>F</sup> behaves like the implementation <sup>I</sup> where all actions in <sup>F</sup> are forbidden (\ is Milner's restriction operator). Thus, we effectively distinguish between the nominal model and its fault-tolerant version and the set of faults taken into account.

In order to measure the degree of masking fault-tolerance of a given system, we start characterizing masking fault-tolerance via simulation relations between two systems as defined in [11]. The first one acting as a specification of the intended behavior (i.e., nominal model) and the second one as the fault-tolerant implementation (i.e., the extended model with faulty behavior). The existence of a masking relation implies that the implementation masks the faults. Afterwards, we introduce a game characterization of masking simulation and we enrich the resulting games with quantitative objectives to define the notion of *masking fault-tolerance distance*, where the possible values of the game belong to the interval [0, 1]. The fault-tolerant implementation is masking fault-tolerant if the value of the game is 0. Furthermore, the bigger the number, the farther the masking distance between the fault-tolerant implementation and the specification. Accordingly, a bigger distance remarkably decreases fault-tolerance. Thus, for a given nominal model N and two different fault-tolerant implementations I<sup>1</sup> and I2, our distance ensures that δm(N, I1) < δm(N, I2) whenever I<sup>1</sup> tolerates more faults than I2. We also provide a weak version of masking simulation, which makes it possible to deal with complex systems composed of several interacting components. We prove that masking distance is a directed semimetric, that is, it satisfies two basic properties of any distance, reflexivity and the triangle inequality.

Finally, we have implemented our approach in a tool that takes as input a nominal model and its fault-tolerant implementation and automatically compute the masking distance between them. We have used this tool to measure the masking tolerance of multiple instances of several case studies such as a redundant cell memory, a variation of the dining philosophers problem, the bounded retransmission protocol, N-Modular-Redundancy, and the Byzantine generals problem. These are typical examples of fault-tolerant systems.

The remainder of the paper is structured as follows. In Sect. 2, we introduce preliminaries notions used throughout this paper. We present in Sect. 3 the formal definition of masking distance build on quantitative simulation games and we also prove its basic properties. We describe in Sect. 4 the experimental evaluation on some well-known case studies. In Sect. 5 we discuss the related work. Finally, we discuss in Sect. 6 some conclusions and directions for further work. Full details and proofs can be found in [5].

### **2 Preliminaries**

Let us introduce some basic definitions and results on game theory that will be necessary across the paper, the interested reader is referred to [2].

<sup>A</sup> *transition system* (TS) is a tuple <sup>A</sup> <sup>=</sup> -S, Σ, E, s0, where <sup>S</sup> is a finite set of states, <sup>Σ</sup> is a finite alphabet, <sup>E</sup> <sup>⊆</sup> <sup>S</sup>×Σ×<sup>S</sup> is a set of labelled transitions, and s<sup>0</sup> is the initial state. In the following we use s <sup>e</sup> −→ <sup>s</sup> <sup>∈</sup> <sup>E</sup> to denote (s, e, s ) <sup>∈</sup> <sup>E</sup>. Let <sup>|</sup>S<sup>|</sup> and <sup>|</sup>E<sup>|</sup> denote the number of states and edges, respectively. We define post(s) = {s <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>s</sup> <sup>e</sup> −→ <sup>s</sup> <sup>∈</sup> <sup>E</sup>} as the set of successors of <sup>s</sup>. Similarly, pre(s ) = {<sup>s</sup> <sup>∈</sup> <sup>S</sup> <sup>|</sup> <sup>s</sup> <sup>e</sup> −→ <sup>s</sup> <sup>∈</sup> <sup>E</sup>} as the set of predecessors of <sup>s</sup> . Moreover, post∗(s) denotes the states which are reachable from s. Without loss of generality, we require that every state <sup>s</sup> has a successor, i.e., <sup>∀</sup><sup>s</sup> <sup>∈</sup> <sup>S</sup> : post(s) <sup>=</sup> <sup>∅</sup>. A run in a transition system <sup>A</sup> is an infinite path <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup>0σ0ρ1σ1ρ2σ<sup>2</sup> ···∈ (<sup>S</sup> · <sup>Σ</sup>)<sup>w</sup> where ρ<sup>0</sup> = s<sup>0</sup> and for all i, ρ<sup>i</sup> <sup>σ</sup>*<sup>i</sup>* −→ <sup>ρ</sup>i+1 <sup>∈</sup> <sup>E</sup>. From now on, given a tuple (x0,...,xn), we denote x<sup>i</sup> by pri((x0,...,xn)).

<sup>A</sup> *game graph* <sup>G</sup> is a tuple <sup>G</sup> <sup>=</sup> -S, S1, S2,Σ,E,s0 where <sup>S</sup>, <sup>Σ</sup>, <sup>E</sup> and <sup>s</sup><sup>0</sup> are as in transition systems and (S1, S2) is a partition of S. The choice of the next state is made by Player 1 (Player 2) when the current state is in S<sup>1</sup> (respectively, S2). A weighted game graph is a game graph along with a weight function v<sup>G</sup> from E to Q. A run in the game graph G is called a *play*. The set of all plays is denoted by Ω.

Given a game graph <sup>G</sup>, a *strategy* for Player 1 is a function <sup>π</sup> : (<sup>S</sup> · <sup>Σ</sup>)∗S<sup>1</sup> <sup>→</sup> <sup>Σ</sup> <sup>×</sup> <sup>S</sup> such that for all <sup>ρ</sup>0σ0ρ1σ<sup>1</sup> ...ρ<sup>i</sup> <sup>∈</sup> (<sup>S</sup> · <sup>Σ</sup>)∗S1, we have that if π(ρ0σ0ρ1σ<sup>1</sup> ...ρi)=(σ, ρ), then ρ<sup>i</sup> σ −→ <sup>ρ</sup> <sup>∈</sup> <sup>E</sup>. A strategy for Player 2 is defined in a similar way. The set of all strategies for Player p is denoted by Πp. A strategy for player p is said to be memoryless (or positional) if it can be defined by a mapping <sup>f</sup> : <sup>S</sup><sup>p</sup> <sup>→</sup> <sup>E</sup> such that for all <sup>s</sup> <sup>∈</sup> <sup>S</sup><sup>p</sup> we have that pr0(f(s)) = s, that is, these strategies do not need memory of the past history. Furthermore, a play ρ0σ0ρ1σ1ρ2σ<sup>2</sup> ... conforms to a player p strategy π if <sup>∀</sup><sup>i</sup> <sup>≥</sup> 0:(ρ<sup>i</sup> <sup>∈</sup> <sup>S</sup>p) <sup>⇒</sup> (σi, ρi+1) = <sup>π</sup>(ρ0σ0ρ1σ<sup>1</sup> ...ρi). The *outcome* of a Player 1 strategy π<sup>1</sup> and a Player 2 strategy π<sup>2</sup> is the unique play, named out(π1, π2), that conforms to both π<sup>1</sup> and π2.

A *game* is made of a game graph and a boolean or quantitative objective. <sup>A</sup> *boolean objective* is a function <sup>Φ</sup> : <sup>Ω</sup> → {0, <sup>1</sup>} and the goal of Player 1 in a game with objective Φ is to select a strategy so that the outcome maps to 1, independently what Player 2 does. On the contrary, the goal of Player 2 is to ensure that the outcome maps to 0. Given a boolean objective Φ, a play ρ is *winning* for Player 1 (resp. Player 2) if Φ(ρ) = 1 (resp. Φ(ρ) = 0). A strategy π is a *winning strategy* for Player p if every play conforming to π is winning for Player p. We say that a game with boolean objective is *determined* if some player has a winning strategy, and we say that it is memoryless determined if that winning strategy is memoryless. Reachability games are those games whose objective functions are defined as <sup>Φ</sup>(ρ0σ0ρ1σ1ρ2σ<sup>2</sup> ...)=(∃<sup>i</sup> : <sup>ρ</sup><sup>i</sup> <sup>∈</sup> <sup>V</sup> ) for some set <sup>V</sup> <sup>⊆</sup> <sup>S</sup>, a standard result is that reachability games are memoryless determined.

<sup>A</sup> *quantitative objective* is given by a *payoff* function <sup>f</sup> : <sup>Ω</sup> <sup>→</sup> <sup>R</sup> and the goal of Player 1 is to maximize the value f of the play, whereas the goal of Player 2 is to minimize it. For a quantitative objective f, the value of the game for a Player 1 strategy π1, denoted by v1(π1), is defined as the infimum over all the values resulting from Player 2 strategies, i.e., <sup>v</sup>1(π1) = infπ2∈Π<sup>2</sup> <sup>f</sup>(out(π1, π2)). The value of the game for Player 1 is defined as the supremum of the values of all Player 1 strategies, i.e., supπ1∈Π<sup>1</sup> <sup>v</sup>1(π1). Analogously, the value of the game for a Player 2 strategy π<sup>2</sup> and the value of the game for Player 2 are defined as <sup>v</sup>2(π2) = sup<sup>π</sup>1∈Π<sup>1</sup> <sup>f</sup>(out(π1, π2)) and inf<sup>π</sup>2∈Π<sup>2</sup> <sup>v</sup>2(π2), respectively. We say that a game is determined if both values are equal, that is: sup<sup>π</sup>1∈Π<sup>1</sup> <sup>v</sup>1(π1) = inf<sup>π</sup>2∈Π<sup>2</sup> <sup>v</sup>2(π2). In this case we denote by val(G) the value of game <sup>G</sup>. The following result from [24] characterizes a large set of determined games.

**Theorem 1.** *Any game with a quantitative function* f *that is bounded and Borel measurable is determined.*

### **3 Masking Distance**

We start by defining masking simulation. In [11], we have defined a statebased simulation for masking fault-tolerance, here we recast this definition using labelled transition systems. First, let us introduce some concepts needed for defining masking fault-tolerance. For any vocabulary Σ, and set of labels <sup>F</sup> <sup>=</sup> {F0,...,Fn} not belonging to <sup>Σ</sup>, we consider <sup>Σ</sup><sup>F</sup> <sup>=</sup> <sup>Σ</sup>∪F, where F ∩<sup>Σ</sup> <sup>=</sup> <sup>∅</sup>. Intuitively, the elements of F indicate the occurrence of a fault in a faulty implementation. Furthermore, sometimes it will be useful to consider the set <sup>Σ</sup><sup>i</sup> <sup>=</sup> {e<sup>i</sup> <sup>|</sup> <sup>e</sup> <sup>∈</sup> <sup>Σ</sup>}, containing the elements of <sup>Σ</sup> indexed with superscript <sup>i</sup>. Moreover, for any vocabulary <sup>Σ</sup> we consider <sup>Σ</sup><sup>M</sup> <sup>=</sup> <sup>Σ</sup> ∪ {M}, where M /<sup>∈</sup> <sup>Σ</sup>, intuitively, this label is used to identify masking transitions.

Given a transition system <sup>A</sup> <sup>=</sup> -S, Σ, E, s0 over a vocabulary <sup>Σ</sup>, we denote <sup>A</sup><sup>M</sup> <sup>=</sup> -S, Σ<sup>M</sup> , EM, s0 where <sup>E</sup><sup>M</sup> <sup>=</sup> <sup>E</sup> ∪ {<sup>s</sup> <sup>M</sup>−→ <sup>s</sup> <sup>|</sup> <sup>s</sup> <sup>∈</sup> <sup>S</sup>}.

#### **3.1 Strong Masking Simulation**

**Definition 1.** *Let* <sup>A</sup> <sup>=</sup> -S, Σ, E, s0 *and* <sup>A</sup> <sup>=</sup> -S , Σ<sup>F</sup> , E , s <sup>0</sup> *be two transition systems.* A *is* strong masking fault-tolerant *with respect to* A *if there exists a relation* **<sup>M</sup>** <sup>⊆</sup> <sup>S</sup> <sup>×</sup> <sup>S</sup> *between* <sup>A</sup><sup>M</sup> *and* <sup>A</sup> *such that:*

*(A)* s<sup>0</sup> **M** s <sup>0</sup>*, and (B) for all* <sup>s</sup> <sup>∈</sup> S, s <sup>∈</sup> <sup>S</sup> *with* <sup>s</sup> **<sup>M</sup>** <sup>s</sup> *and all* <sup>e</sup> <sup>∈</sup> <sup>Σ</sup> *the following holds: (1) if* (s <sup>e</sup> −→ <sup>t</sup>) <sup>∈</sup> <sup>E</sup> *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (s <sup>e</sup> −→ <sup>t</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> )*; (2) if* (s <sup>e</sup> −→ <sup>t</sup> ) <sup>∈</sup> <sup>E</sup> *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (<sup>s</sup> <sup>e</sup> −→ <sup>t</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> )*; (3) if* (s <sup>F</sup> −→ <sup>t</sup> ) *for some* <sup>F</sup> ∈ F *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (<sup>s</sup> <sup>M</sup>−→ <sup>t</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> ).

*If such relation exists we say that* A *is a* strong masking fault-tolerant implementation *of* <sup>A</sup>*, denoted by* <sup>A</sup> <sup>m</sup> <sup>A</sup> *.*

We say that state s is masking fault-tolerant for s when s **M** s . Intuitively, the definition states that, starting in s , faults can be masked in such a way that the behavior exhibited is the same as that observed when starting from s and executing transitions without faults. In other words, a masking relation ensures that every faulty behavior in the implementation can be simulated by the specification. Let us explain in more detail the above definition. First, note that conditions A, B.1, and B.2 imply that we have a bisimulation when A and A do not exhibit faulty behavior. Particularly, condition B.1 says that the normal execution of A can be simulated by an execution of A . On the other hand, condition B.2 says that the implementation does not add normal (non-faulty) behavior. Finally, condition B.3 states that every outgoing faulty transition (F) from s must be matched to an outgoing masking transition (M) from s.

#### **3.2 Weak Masking Simulation**

For analysing nontrivial systems a weak version of masking simulation relation is needed, the main idea is that a weak masking simulation abstracts away from internal behaviour, which is modeled by a special action τ . Note that internal transitions are common in fault-tolerance: the actions performed as part of a fault-tolerant procedure in a component are usually not observable by the rest of the system.

The *weak transition relations* ⇒ ⊆ <sup>S</sup> <sup>×</sup>(<sup>Σ</sup> ∪ {τ}∪{M}∪F)×S, also denoted as E<sup>W</sup> , considers the *silent* step τ and is defined as follows:

$$
\stackrel{e}{\Rightarrow} = \begin{cases}
(\stackrel{\tau}{\rightarrow})^\* \circ \stackrel{e}{\rightarrow} \circ (\stackrel{\tau}{\rightarrow})^\* & \text{if } e \in \Sigma, \\
(\stackrel{e}{\rightarrow})^\* & \text{if } e = \tau, \\
\stackrel{e}{\rightarrow} & \text{if } e \in \{M\} \cup \mathcal{F}.
\end{cases}
$$

The symbol ◦ stands for composition of binary relations and ( <sup>τ</sup> −→)<sup>∗</sup> is the reflexive and transitive closure of the binary relation <sup>τ</sup> −→.

Intuitively, if e /∈ {τ,M}∪F, then <sup>s</sup> <sup>e</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> means that there is a sequence of zero or more τ transitions starting in s, followed by one transition labelled by e, followed again by zero or more τ transitions eventually reaching s . s <sup>τ</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> states that s can transition to s via zero or more τ transitions. In particular, s <sup>τ</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> for every <sup>s</sup>. For the case in which <sup>e</sup> ∈ {M}∪F, <sup>s</sup> <sup>e</sup> <sup>=</sup><sup>⇒</sup> <sup>s</sup> is equivalent to s <sup>e</sup> −→ <sup>s</sup> and hence no τ step is allowed before or after the e transition.

**Definition 2.** *Let* <sup>A</sup> <sup>=</sup> -S, Σ, E, s0 *and* <sup>A</sup> <sup>=</sup> -S , Σ<sup>F</sup> , E , s <sup>0</sup> *be two transition systems with* Σ *possibly containing* τ *.* A *is* weak masking fault-tolerant *with respect to* <sup>A</sup> *if there is a relation* **<sup>M</sup>** <sup>⊆</sup> <sup>S</sup> <sup>×</sup> <sup>S</sup> *between* <sup>A</sup><sup>M</sup> *and* <sup>A</sup> *such that:*

*(A)* s<sup>0</sup> **M** s 0 *(B) for all* <sup>s</sup> <sup>∈</sup> S, s <sup>∈</sup> <sup>S</sup> *with* <sup>s</sup> **<sup>M</sup>** <sup>s</sup> *and all* <sup>e</sup> <sup>∈</sup> <sup>Σ</sup> ∪ {τ} *the following holds: (1) if* (s <sup>e</sup> −→ <sup>t</sup>) <sup>∈</sup> <sup>E</sup> *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (s *<sup>e</sup>* <sup>=</sup><sup>⇒</sup> <sup>t</sup> <sup>∈</sup> <sup>E</sup> <sup>W</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> )*; (2) if* (s <sup>e</sup> −→ <sup>t</sup> ) <sup>∈</sup> <sup>E</sup> *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (<sup>s</sup> <sup>e</sup> <sup>=</sup><sup>⇒</sup> <sup>t</sup> <sup>∈</sup> <sup>E</sup><sup>W</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> )*; (3) if* (s <sup>F</sup> −→ <sup>t</sup> ) <sup>∈</sup> <sup>E</sup> *for some* <sup>F</sup> ∈ F *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (<sup>s</sup> <sup>M</sup>−→ <sup>t</sup> <sup>∈</sup> <sup>E</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> ). *If such relation exists, we say that* A *is a* weak masking fault-tolerant implementation *of* <sup>A</sup>*, denoted by* <sup>A</sup> <sup>w</sup> <sup>m</sup> <sup>A</sup> *.*

The following theorem makes a strong connection between strong and weak masking simulation. It states that weak masking simulation becomes strong masking simulation whenever transition −→ is replaced by =⇒ in the original automata.

**Theorem 2.** *Let* <sup>A</sup> <sup>=</sup> -S, Σ, E, s0 *and* <sup>A</sup> <sup>=</sup> -S , Σ<sup>F</sup> , E , s <sup>0</sup>*.* **<sup>M</sup>** <sup>⊆</sup> <sup>S</sup> <sup>×</sup> <sup>S</sup> *between* A<sup>M</sup> *and* A *is a weak masking simulation if and only if:*

*(A)* s<sup>0</sup> **M** s <sup>0</sup>*, and (B) for all* <sup>s</sup> <sup>∈</sup> S, s <sup>∈</sup> <sup>S</sup> *with* <sup>s</sup> **<sup>M</sup>** <sup>s</sup> *and all* <sup>e</sup> <sup>∈</sup> <sup>Σ</sup> ∪ {τ} *the following holds: (1) if* (s <sup>e</sup> <sup>=</sup><sup>⇒</sup> <sup>t</sup>) <sup>∈</sup> <sup>E</sup><sup>W</sup> *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (s *<sup>e</sup>* <sup>=</sup><sup>⇒</sup> <sup>t</sup> <sup>∈</sup> <sup>E</sup> <sup>W</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> )*; (2) if* (s <sup>e</sup> <sup>=</sup><sup>⇒</sup> <sup>t</sup> ) <sup>∈</sup> <sup>E</sup> <sup>W</sup> *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (<sup>s</sup> <sup>e</sup> <sup>=</sup><sup>⇒</sup> <sup>t</sup> <sup>∈</sup> <sup>E</sup><sup>W</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> )*; (3) if* (s <sup>F</sup> <sup>=</sup><sup>⇒</sup> <sup>t</sup> ) <sup>∈</sup> <sup>E</sup> <sup>W</sup> *for some* <sup>F</sup> ∈ F *then* <sup>∃</sup> <sup>t</sup> <sup>∈</sup> <sup>S</sup> : (<sup>s</sup> <sup>M</sup>=<sup>⇒</sup> <sup>t</sup> <sup>∈</sup> <sup>E</sup><sup>W</sup> <sup>∧</sup> <sup>t</sup> **<sup>M</sup>** <sup>t</sup> )

The proof of this is straightforward following the same ideas of Milner in [25].

A natural way to check weak bisimilarity is to *saturate* the transition system [14,25] and then check strong bisimilarity on the saturated transition system. Similarly, Theorem 2 allows us to compute weak masking simulation by reducing this problem to compute strong masking simulation. Note that <sup>e</sup> =⇒ can be alternatively defined by:

$$\frac{p \xrightarrow[e]{e} q}{p \xrightarrow{e} q} \qquad\qquad\qquad \frac{p \xrightarrow[r]{\tau} p\_1 \xrightarrow{e} q\_1 \xrightarrow{\tau} q}{p \xrightarrow{e} q} \left(e \notin \{M\} \cup \mathcal{F}\right)$$

As a running example, we consider a memory cell that stores a bit of information and supports reading and writing operations, presented in a state-based form in [11]. A state in this system maintains the current value of the memory cell (m = i, for i = 0, 1), writing allows one to change this value, and reading returns the stored value. Obviously, in this system the result of a reading depends on the value stored in the cell. Thus, a property that one might associate with this model is that the value read from the cell coincides with that of the last writing performed in the system.

A potential fault in this scenario occurs when a cell unexpectedly loses its charge, and its stored value turns into another one (e.g., it changes from 1 to 0 due to charge loss). A typical technique to deal with this situation is *redundancy*: use three memory bits instead of one. Writing operations are performed simultaneously on the three bits. Reading, on the other hand, returns the value that is repeated at least twice in the memory bits; this is known as *voting*.

We take the following approach to model this system. Labels W0,W1, R0, and R<sup>1</sup> represent writing and reading operations. Specifically, W<sup>0</sup> (resp. W1): writes a zero (resp. one) in the memory. R<sup>0</sup> (resp. R1): reads a zero (resp. one) from the memory. Figure 1 depicts four transition systems. The leftmost one represents the nominal system for this example (denoted as A). The second one from the left characterizes the nominal transition system augmented with masking

**Fig. 1.** Transition systems for the memory cell.

transitions, i.e., A<sup>M</sup>. The third and fourth transition systems are fault-tolerant implementations of A, named A and A, respectively. Note that A contains one fault, while A considers two faults. Both implementations use triple redundancy; intuitively, state t<sup>0</sup> contains the three bits with value zero and t<sup>1</sup> contains the three bits with value one. Moreover, state t<sup>2</sup> is reached when one of the bits was flipped (either 001, 010 or 100). In A, state t<sup>3</sup> is reached after a second bit is flipped (either 011 or 101 or 110) starting from state t0. It is straightforward to see that there exists a relation of masking fault-tolerance between A<sup>M</sup> and A , as it is witnessed by the relation **<sup>M</sup>** <sup>=</sup> {(s0,t0),(s1,t1),(s0,t2)}. It is a routine to check that **M** satisfies the conditions of Definition 1. On the other hand, there does not exist a masking relation between A<sup>M</sup> and A because state t<sup>3</sup> needs to be related to state s<sup>0</sup> in any masking relation. This state can only be reached by executing faults, which are necessarily masked with M-transitions. However, note that, in state t3, we can read a 1 (transition t<sup>3</sup> <sup>R</sup><sup>1</sup> −−→ <sup>t</sup>3) whereas, in state s0, we can only read a 0.

#### **3.3 Masking Simulation Game**

We define a masking simulation game for two transition systems (the specification of the nominal system and its fault-tolerant implementation) that captures masking fault-tolerance. We first define the masking game graph where we have two players named by convenience the *refuter* (R) and the *verifier* (V ).

**Definition 3.** *Let* <sup>A</sup> <sup>=</sup> -S, Σ, E, s0 *and* <sup>A</sup> <sup>=</sup> -S , Σ<sup>F</sup> , E <sup>W</sup> , s <sup>0</sup>*. The* strong masking game graph GA*M*,A- = -SG, SR, S<sup>V</sup> , ΣG, EG, s<sup>0</sup> <sup>G</sup> *for two players is defined as follows:*


*and* E<sup>G</sup> *is the minimal set satisfying:*


The intuition of this game is as follows. The refuter chooses transitions of either the specification or the implementation to play, and the verifier tries to match her choice, this is similar to the bisimulation game [28]. However, when the refuter chooses a fault, the verifier must match it with a masking transition (M). The intuitive reading of this is that the fault-tolerant implementation masked the fault in such a way that the occurrence of this fault cannot be noticed from the users' side. R wins if the game reaches the error state, i.e., serr. On the other hand, V wins when serr is not reached during the game. (This is basically a reachability game [26]).

<sup>A</sup> *weak masking game graph* <sup>G</sup><sup>W</sup> A*M*,A is defined in the same way as the strong masking game graph in Definition 3, with the exception that <sup>Σ</sup><sup>M</sup> and <sup>Σ</sup><sup>F</sup> may contain τ , and the set of labelled transitions (denoted as E<sup>G</sup> <sup>W</sup> ) is now defined using the weak transition relations (i.e., E<sup>W</sup> and E <sup>W</sup> ) from the respective transition systems.

Figure 2 shows a part of the strong masking game graph for the running example considering the transition systems A<sup>M</sup> and A. We can clearly observe

on the game graph that the verifier cannot mimic the transition (s0, #, t3, R) <sup>R</sup><sup>2</sup> <sup>1</sup> −−→ (s0, R<sup>2</sup> <sup>1</sup>, t3, V ) selected by the refuter which reads a 1 at state t<sup>3</sup> on the faulttolerant implementation. This is because the verifier can only read a 0 at state s0. Then, the serr is reached and the refuter wins.

As expected, there is a strong masking simulation between A and A if and only if the verifier has a winning strategy in GA*M*,A-.

**Theorem 3.** *Let* <sup>A</sup> <sup>=</sup> -S, Σ, E, s0 *and* <sup>A</sup> <sup>=</sup> -S , Σ<sup>F</sup> , E , s <sup>0</sup>*.* <sup>A</sup> <sup>m</sup> <sup>A</sup> *iff the verifier has a winning strategy for the strong masking game graph* GA*M*,A-*.*

By Theorems 2 and 3, the result replicates for weak masking game.

**Theorem 4.** *Let* <sup>A</sup> <sup>=</sup> -S, Σ ∪ {τ},E,s0 *and* <sup>A</sup> <sup>=</sup> -S , Σ<sup>F</sup> ∪ {τ}, E , s 0*.* <sup>A</sup> <sup>w</sup> <sup>m</sup> <sup>A</sup> *iff the verifier has a winning strategy for the weak masking game graph* <sup>G</sup><sup>W</sup> A*M*,A-*.*

Using the standard properties of reachability games we get the following property.

**Theorem 5.** *For any* A *and* A *, the strong (resp. weak) masking game graph* GA*M*,A- *(resp.* <sup>G</sup><sup>W</sup> A*M*,A- *) can be determined in time* <sup>O</sup>(|E<sup>G</sup>|) *(resp.* <sup>O</sup>(|E<sup>G</sup> <sup>W</sup> |)*).*

**Fig. 2.** Part of the masking game graph for memory cell model with two faults

The set of winning states for the refuter can be defined in a standard way from the error state [26]. We adapt ideas in [26] to our setting. For i, j <sup>≥</sup> 0, sets U<sup>j</sup> <sup>i</sup> are defined as follows:

$$\begin{aligned} U\_i^0 &= U\_0^j = \emptyset, \\ U\_1^1 &= \{ s\_{err} \}, \\ U\_{i+1}^{j+1} &= \{ v' \mid v' \in S\_R \land post(v') \cap U\_{i+1}^j \neq \emptyset \} \\ &\cup \{ v' \mid v' \in S\_V \land post(v') \subseteq \bigcup\_{j' \le j} U\_{i+1}^{j'} \land post(v') \cap U\_{i+1}^j \neq \emptyset \land \pi\_2(v') \notin \mathcal{F} \} \\ &\cup \{ v' \mid v' \in S\_V \land post(v') \subseteq \bigcup\_{i' \le i, j' \le j} U\_{i'}^{j'} \land post(v') \cap U\_i^j \neq \emptyset \land \pi\_2(v') \in \mathcal{F} \} \end{aligned} \tag{1}$$

then U<sup>k</sup> = <sup>i</sup>≥<sup>0</sup> <sup>U</sup><sup>k</sup> <sup>i</sup> and <sup>U</sup> <sup>=</sup> <sup>k</sup>≥<sup>0</sup> <sup>U</sup><sup>k</sup>. Intuitively, the subindex <sup>i</sup> in <sup>U</sup><sup>k</sup> <sup>i</sup> indicates that <sup>s</sup>err is reach after at most <sup>i</sup> <sup>−</sup> 1 faults occurred. The following lemma is straightforwardly proven using standard techniques of reachability games [9].

**Lemma 1.** *The refuter has a winning strategy in* GA*M*,A- *(or* <sup>G</sup><sup>W</sup> A*M*,A- *) iff* <sup>s</sup>init <sup>∈</sup> U<sup>k</sup>*, for some* k*.*

#### **3.4 Quantitative Masking**

In this section, we extend the strong masking simulation game introduced above with quantitative objectives to define the notion of masking fault-tolerance distance. Note that we use the attribute "quantitative" in a non-probabilistic sense.

**Definition 4.** *For transition systems* A *and* A *, the* quantitative strong masking game graph QA*M*,A- = -SG, SR, S<sup>V</sup> , ΣG, EG, s<sup>G</sup> <sup>0</sup> , v<sup>G</sup> *is defined as follows:*

$$\begin{aligned} -\mathcal{G}\_{A^M,A'} &= \langle S^G, S\_R, S\_V, \Sigma^G, E^G, s\_0^G \rangle \text{ is defined as in } \text{Definition 3}, \\ -\ v^G(s \xrightarrow{e} s') &= \langle \chi\_{\mathcal{F}}(e), \chi\_{s\_{err}}(s') \rangle \end{aligned}$$

*where* <sup>χ</sup><sup>F</sup> *is the characteristic function over set* <sup>F</sup>*, returning* <sup>1</sup> *if* <sup>e</sup> ∈ F *and* <sup>0</sup> *otherwise, and* <sup>χ</sup>s*err is the characteristic function over the singleton set* {serr}*.*

Note that the cost function returns a pair of numbers instead of a single number. It is direct to codify this pair into a number, but we do not do it here for the sake of clarity. We remark that the *quantitative weak masking game graph* <sup>Q</sup><sup>W</sup> A*M*,A is defined in the same way as the game graph defined above but using the weak masking game graph <sup>G</sup><sup>W</sup> A*M*,A instead of GA*M*,A-.

Given a quantitative strong masking game graph with the weight function <sup>v</sup><sup>G</sup> and a play <sup>ρ</sup> <sup>=</sup> <sup>ρ</sup>0σ0ρ1σ1ρ2,..., for all <sup>i</sup> <sup>≥</sup> 0, let <sup>v</sup><sup>i</sup> <sup>=</sup> <sup>v</sup><sup>G</sup>(ρ<sup>i</sup> <sup>σ</sup>*<sup>i</sup>* −→ <sup>ρ</sup>i+1). We define the *masking payoff function* as follows:

$$f\_m(\rho) = \lim\_{n \to \infty} \frac{\text{pr}\_1(v\_n)}{1 + \sum\_{i=0}^n \text{pr}\_0(v\_i)},$$

which is proportional to the inverse of the number of masking movements made by the verifier. To see this, note that the numerator of pr1(v*n*) 1+*n <sup>i</sup>*=0 pr0(v*i*) will be 1 when we reach the error state, that is, in those paths not reaching the error state this formula returns 0. Furthermore, if the error state is reached, then the denominator will count the number of fault transitions taken until the error state. All of them, except the last one, were masked successfully. The last fault, instead, while attempted to be masked by the verifier, eventually leads to the error state. That is, the transitions with value (1, ) are those corresponding to faults. The others are mapped to (0, ). Notice also that if serr is reached in v<sup>n</sup> without the occurrence of any fault, the nominal part of the implementation does not match the nominal specification, in which case pr1(v*n*) 1+*n <sup>i</sup>*=0 pr0(v*i*) = 1. Then, the refuter wants to maximize the value of any run, that is, she will try to execute faults leading to the state serr. In contrast, the verifier wants to avoid serr and then she will try to mask faults with actions that take her away from the error state. More precisely, the value of the quantitative strong masking game for the refuter is defined as valR(QA*M*,A- ) = sup<sup>π</sup>*R*∈Π*<sup>R</sup>* inf<sup>π</sup>*<sup>V</sup>* <sup>∈</sup>Π*<sup>V</sup>* <sup>f</sup>m(out(πR, π<sup>V</sup> )). Analogously, the value of the game for the verifier is defined as val<sup>V</sup> (QA*M*,A- ) = inf<sup>π</sup>*<sup>V</sup>* <sup>∈</sup>Π*<sup>V</sup>* sup<sup>π</sup>*R*∈Π*<sup>R</sup>* <sup>f</sup>m(out(πR, π<sup>V</sup> )). Then, we define the value of the quantitative strong masking game, denoted by val(QA*M*,A- ), as the value of the game either for the refuter or the verifier, i.e., val(QA*M*,A- ) = valR(QA*M*,A- ) = val<sup>V</sup> (QA*M*,A- ). This can be done because quantitative strong masking games are determined as we prove below in Theorem 6.

**Definition 5.** *Let* A *and* A *be transition systems. The* strong masking distance *between* A *and* A *, denoted by* δm(A, A ) *is defined as:* δm(A, A ) = val(QA*M*,A-).

We would like to remark that the *weak masking distance* δ<sup>W</sup> <sup>m</sup> is defined in the same way for the quantitative weak masking game graph <sup>Q</sup><sup>W</sup> A*M*,A- . Roughly speaking, we are interesting on measuring the number of faults that can be masked. The value of the game is essentially determined by the faulty and masking labels on the game graph and how the players can find a strategy that leads (or avoids) the state serr, independently if there are or not silent actions.

In the following, we state some basic properties of this kind of games. As already anticipated, quantitative strong masking games are determined:

**Theorem 6.** *For any quantitative strong masking game* QA*M*,A *with payoff function* fm*:*

$$\inf\_{\pi\_V \in \Pi\_V} \sup\_{\pi\_R \in \Pi\_R} f\_m(out(\pi\_R, \pi\_V)) = \sup\_{\pi\_R \in \Pi\_R} \inf\_{\pi\_V \in \Pi\_V} f\_m(out(\pi\_R, \pi\_V))$$

The value of the quantitative strong masking game can be calculated as stated below.

**Theorem 7.** *Let* QA*M*,A *be a quantitative strong masking game. Then,* val(QA*M*,A- ) = <sup>1</sup> <sup>w</sup> *, with* <sup>w</sup> = min{<sup>i</sup> | ∃<sup>j</sup> : <sup>s</sup>init <sup>∈</sup> <sup>U</sup><sup>j</sup> <sup>i</sup> }*, whenever* <sup>s</sup>init <sup>∈</sup> <sup>U</sup>*, and* val(QA*M*,A- )=0 *otherwise, where sets* U<sup>j</sup> <sup>i</sup> *and* <sup>U</sup> *are defined in Eq. (1).*

Note that the sets U<sup>j</sup> <sup>i</sup> can be calculated using a bottom-up breadth-first search from the error state. Thus, the strategies for the refuter and the verifier can be defined using these sets, without taking into account the history of the play. That is, we have the following theorems:

**Theorem 8.** *Players* <sup>R</sup> *and* <sup>V</sup> *have memoryless winning strategies for* <sup>Q</sup>A*M*,A- *.* Theorems 6, 7, and <sup>8</sup> apply as well to <sup>Q</sup><sup>W</sup> A*M*,A-. The following theorem states the

complexity of determining the value of the two types of games.

**Theorem 9.** *The quantitative strong (weak) masking game can be determined in time* <sup>O</sup>(|S<sup>G</sup><sup>|</sup> <sup>+</sup> <sup>|</sup>E<sup>G</sup>|) *(resp.* <sup>O</sup>(|S<sup>G</sup><sup>|</sup> <sup>+</sup> <sup>|</sup>E<sup>G</sup> <sup>W</sup> |)*).*

Theorems 5 and 9 describe the complexity of solving the quantitative and standard masking games. However, in practice, one needs to bear in mind that <sup>|</sup>S<sup>G</sup><sup>|</sup> <sup>=</sup> <sup>|</sup>S|∗|S <sup>|</sup> and <sup>|</sup>E<sup>G</sup><sup>|</sup> <sup>=</sup> <sup>|</sup>E<sup>|</sup> <sup>+</sup> <sup>|</sup>E |, so constructing the game takes <sup>O</sup>(|S<sup>|</sup> <sup>2</sup> ∗ |S | <sup>2</sup>) steps in the worst case. Additionally, for the weak games, the transitive closure of the original model needs to be computed, which for the best known algorithm yields <sup>O</sup>(max(|S|, <sup>|</sup>S <sup>|</sup>)<sup>2</sup>.<sup>3727</sup>) [30].

By using <sup>Q</sup><sup>W</sup> A*M*,A instead of QA*M*,A in Definition 5, we can define the *weak masking distance* δ<sup>W</sup> <sup>m</sup> . The next theorem states that, if A and A are at distance 0, there is a strong (or weak) masking simulation between them.

**Theorem 10.** *For any transition systems* A *and* A *, then (i)* δm(A, A )=0 *iff* <sup>A</sup> <sup>m</sup> <sup>A</sup> *, and (ii)* δ<sup>W</sup> <sup>m</sup> (A, A )=0 *iff* <sup>A</sup> <sup>w</sup> <sup>m</sup> <sup>A</sup> *.*

This follows from Theorem 7. Noting that <sup>A</sup> <sup>m</sup> <sup>A</sup> (and <sup>A</sup> <sup>w</sup> <sup>m</sup> <sup>A</sup>) for any transition system A, we obtain that δm(A, A) = 0 (resp. δ<sup>W</sup> <sup>m</sup> (A, A) = 0) by Theorem 10, i.e., both distance are reflexive.

For our running example, the masking distance is 1/3 with a redundancy of 3 bits and considering two faults. This means that only one fault can be masked by this implementation. We can prove a version of the triangle inequality for our notion of distance.

**Theorem 11.** *Let* <sup>A</sup> <sup>=</sup> -S, Σ, E, s0*,* <sup>A</sup> <sup>=</sup> -S , ΣF- , E , s <sup>0</sup>*, and* <sup>A</sup> <sup>=</sup> -<sup>S</sup>, ΣF-- , E, s <sup>0</sup> *be transition systems such that* <sup>F</sup> ⊆ F*. Then* <sup>δ</sup>m(A, A) <sup>≤</sup> δm(A, A ) + δm(A , A) *and* δ<sup>W</sup> <sup>m</sup> (A, A) <sup>≤</sup> <sup>δ</sup><sup>W</sup> <sup>m</sup> (A, A ) + δ<sup>W</sup> <sup>m</sup> (A , A).

Reflexivity and the triangle inequality imply that both masking distances are directed semi-metrics [7,10]. Moreover, it is interesting to note that the triangle inequality property has practical applications. When developing critical software is quite common to develop a first version of the software taking into account some possible anticipated faults. Later, after testing and running of the system, more plausible faults could be observed. Consequently, the system is modified with additional fault-tolerant capabilities to be able to overcome them. Theorem 11 states that incrementally measuring the masking distance between these different versions of the software provides an upper bound to the actual distance between the nominal system and its last fault-tolerant version. That is, if the sum of the distances obtained between the different versions is a small number, then we can ensure that the final system will exhibit an acceptable masking tolerance to faults w.r.t. the nominal system.

#### **4 Experimental Evaluation**

The approach described in this paper has been implemented in a tool in Java called MaskD: Masking Distance Tool [1]. MaskD takes as input a nominal model and its fault-tolerant implementation, and produces as output the masking distance between them. The input models are specified using the guarded command language introduced in [3], a simple programming language common for describing fault-tolerant algorithms. More precisely, a program is a collection of processes, where each process is composed of a collection of actions of the style: Guard <sup>→</sup> Command, where Guard is a boolean condition over the actual state of the program and Command is a collection of basic assignments. These syntactical constructions are called actions. The language also allows user to label an action as internal (i.e., τ actions). Moreover, usually some actions are used to represent faults. The tool has several additional features, for instance it can print the traces to the error state or start a simulation from the initial state.

We report on Table 1 the results of the masking distance for multiple instances of several case studies. These are: a Redundant Cell Memory (our running example), N-Modular Redundancy (a standard example of fault-tolerant system [27]), a variation of the Dining Philosophers problem [13], the Byzantine Generals problem introduced by Lamport et al. [22], and the Bounded Retransmission Protocol (a well-known example of fault-tolerant protocol [16]).

Some words are useful to interpret the results. For the case of a 3 bit memory the masking distance is 0.333, the main reason for this is that the faulty model in the worst case is only able to mask 2 faults (in this example, a fault is an unexpected change of a bit value) before failing to replicate the nominal behaviour (i.e. reading the majority value), thus the result comes from the definition of masking distance and taking into account the occurrence of two faults. The situation is similar for the other instances of this problem with more redundancy.

N-Modular-Redundancy consists of N systems, in which these perform a process and that results are processed by a majority-voting system to produce a single output. Assuming a single perfect voter, we have evaluated this case study for different numbers of modules. Note that the distance measures for this case study are similar to the memory example.

For the dining philosophers problem we have adopted the odd/even philosophers implementation (it prevents from deadlock), i.e., there are <sup>n</sup> <sup>−</sup> <sup>1</sup> *even* philosophers that pick the right fork first, and 1 *odd* philosopher that picks the left fork first. The fault we consider in this case occurs when an even philosopher behaves as an odd one, this could be the case of a byzantine fault. For two philosophers the masking distance is 0.5 since a single fault leads to a deadlock, when more philosophers are added this distance becomes smaller.

Another interesting example of a fault-tolerant system is the Byzantine generals problem, introduced originally by Lamport et al. [22]. This is a consensus problem, where we have a general with <sup>n</sup> <sup>−</sup> 1 lieutenants. The communication between the general and his lieutenants is performed through messengers. The general may decide to attack an enemy city or to retreat; then, he sends the order to his lieutenants. Some of the lieutenants might be traitors. We assume that the messages are delivered correctly and all the lieutenants can communicate directly with each other. In this scenario they can recognize who is sending a message. Faults can convert loyal lieutenants into traitors (byzantines faults). As a consequence, traitors might deliver false messages or perhaps they avoid sending a message that they received. The loyal lieutenants must agree on attacking or retreating after m + 1 rounds of communication, where m is the maximum numbers of traitors.

The Bounded Retransmission Protocol (BRP) is a well-known industrial case study in software verification. While all the other case studies were treated as toy


**Table 1.** Results of the masking distance for the case studies.


examples and analyzed with δm, the BRP was modeled closer to the implementation following [16], considering the different components (sender, receiver, and models of the channels). To analyze such a complex model we have used instead the weak masking distance δ<sup>W</sup> <sup>m</sup> . We have calculated the masking distance for the bounded retransmission protocol with 1, 3 and 5 chunks, denoted BRP(1), BRP(3) and BRP(5), respectively. We observe that the distance values are not affected by the number of chunks to be sent by the protocol. This is expected because the masking distance depends on the redundancy added to mask the faults, which in this case, depends on the number of retransmissions.

We have run our experiments on a MacBook Air with Processor 1.3 GHz Intel Core i5 and a memory of 4 Gb. The tool and case studies for reproducing the results are available in the tool repository.

### **5 Related Work**

In recent years, there has been a growing interest in the quantitative generalizations of the boolean notion of correctness and the corresponding quantitative verification questions [4,6,17,18]. The framework described in [6] is the closest related work to our approach. The authors generalize the traditional notion of simulation relation to three different versions of simulation distance: *correctness*, *coverage*, and *robustness*. These are defined using quantitative games with *discounted-sum* and *mean-payoff* objectives, two well-known cost functions. Similarly to that work, we also consider distances between purely discrete (nonprobabilistic, untimed) systems. Correctness and coverage distances are concerned with the nominal part of the systems, and so faults play no role on them. On the other hand, robustness distance measures how many unexpected errors can be performed by the implementation in such a way that the resulting behavior is tolerated by the specification. So, it can be used to analyze the resilience of the implementation. Note that, robustness distance can only be applied to correct implementations, that is, implementations that preserve the behavior of the specification but perhaps do not cover all its behavior. As noted in [6], bisimilarity sometimes implies a distance of 1. In this sense a greater grade of robustness (as defined in [6]) is achieved by pruning critical points from the specification. Furthermore, the errors considered in that work are transitions mimicking the original ones but with different labels. In contrast to this, in our approach we consider that faults are injected into the fault-tolerant implementation, where their behaviors are not restricted by the nominal system. This follows the idea of model extension in fault-tolerance where faulty behavior is added to the nominal system. Further, note that when no faults are present, the masking distance between the specification and the implementation is 0 when they are bisimilar, and it is 1 otherwise. It is useful to note that robustness distance of [6] is not reflexive. We believe that all these definitions of distance between systems capture different notions useful for software development, and they can be used together, in a complementary way, to obtain an in-depth evaluation of fault-tolerant implementations.

### **6 Conclusions and Future Work**

In this paper, we presented a notion of masking fault-tolerance distance between systems built on a characterization of masking tolerance via simulation relations and a corresponding game representation with quantitative objectives. Our framework is well-suited to support engineers for the analysis and design of faulttolerant systems. More precisely, we have defined a computable masking distance function such that an engineer can measure the masking tolerance of a given fault-tolerant implementation, i.e., the number of faults that can be masked. Thereby, the engineer can measure and compare the masking fault-tolerance distance of alternative fault-tolerant implementations, and select one that fits best to her preferences.

There are many directions for future work. We have only defined a notion of fault-tolerance distance for masking fault-tolerance, similar notions of distance can be defined for other levels of fault-tolerance like failsafe and non-masking. Also, we have focused on non-quantitative models. However, metrics defined on probabilistic models, where the rate of fault occurrences is explicitly represented, could give a more accurate notion of fault tolerance.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

# **PhASAR: An Inter-procedural Static Analysis Framework for C/C++**

Philipp Dominik Schubert1(B) , Ben Hermann1(B) , and Eric Bodden1,2(B)

<sup>1</sup> Heinz Nixdorf Institute, Paderborn University, 33102 Paderborn, Germany *{*philipp.schubert,ben.hermann,eric.bodden*}*@upb.de <sup>2</sup> Fraunhofer IEM, 33102 Paderborn, Germany

**Abstract.** Static program analysis is used to automatically determine program properties, or to detect bugs or security vulnerabilities in programs. It can be used as a stand-alone tool or to aid compiler optimization as an intermediary step. Developing precise, inter-procedural static analyses, however, is a challenging task, due to the algorithmic complexity, implementation effort, and the threat of state explosion which leads to unsatisfactory performance. Software written in C and C++ is notoriously hard to analyze because of the deliberately unsafe type system, unrestricted use of pointers, and (for C++) virtual dispatch. In this work, we describe the design and implementation of the LLVM-based static analysis framework PhASAR for C/C++ code. PhASAR allows data-flow problems to be solved in a fully automated manner. It provides class hierarchy, call-graph, points-to, and data-flow information, hence requiring analysis developers only to specify a definition of the data-flow problem. PhASAR thus hides the complexity of static analysis behind a high-level API, making static program analysis more accessible and easy to use. PhASAR is available as an open-source project. We evaluate PhASAR's scalability during whole-program analysis. Analyzing 12 real-world programs using a taint analysis written in PhASAR, we found PhASAR's abstractions and their implementations to provide a whole-program analysis that scales well to real-world programs. Furthermore, we peek into the details of analysis runs, discuss our experience in developing static analyses for C/C++, and present possible future improvements. Data or code related to this paper is available at: [34].

**Keywords:** Inter-procedural static analysis · LLVM · C/C++

### **1 Introduction**

Programming languages from the C/C++ family are chosen as the implementation language in a multitude of projects especially in cases where a direct interface with the operating system or hardware components is of importance. Large portions of any operating system and virtual machine (such as the Java VM) are written in C or C++. The reason for this is oftentimes the amount of control the programmer has over many aspects that allow for the creation of very efficient programs—but also comes with the obligation to use these features correctly to avoid introducing bugs or opening the program to security vulnerabilities.

To aid developers in creating correct and secure software, a multitude of checks have been included into compilers such as GCC [4] and Clang [2]. Various additional tools such as Cppcheck [12], clang-tidy [9], or the Clang Static Analyzer [8] provide additional means to check for unwanted behavior. Compilercheck passes and additional checkers both use static program analysis to provide warnings to their users. However, to create warnings in a timely fashion, these tools use comparatively simple analyses that provide either only checks for simple properties, or suffer from a large number of false or missed warnings, due to the imprecision or unsoundness of the used analysis.

For programs written in Java, program-analysis frameworks like Soot [16], WALA [33], and Doop [13] are available which allow for a more precise data-flow analysis to determine more intricate program problems. Furthermore, algorithmic frameworks such as *Interprocedural Finite Subset (IFDS)* [24], *Interprocedural Distributive Environments (IDE)* [26], or *Weighted Pushdown Systems (WPDS)* [25] can be used to describe dataflow problems and efficiently compute their possible solutions.

So far, such implementations have not been openly available for programs written in C/C++. This work thus presents the novel program-analysis framework PhASAR, an extension to the LLVM compiler infrastructure [17]. In its inception, we used our experience in developing previous such frameworks for JVM-based languages, namely Soot [16] and OPAL [14], to design a flexible framework that can be adapted to several different types of client analyses. Besides solving data-flow problems, PhASAR can be used to achieve other related goals as well, for instance, call-graph construction, or the computation of points-to information. Its features can be used independently and be included into other software. PhASAR's implementation is written entirely in C++ and is available as open source under the permissive MIT license [23].

PhASAR is intended to be used as a static analyzer. Therefore, it does not substitute but complement features from the LLVM toolchain and provides also for analyses which during compilation would be prohibitively expensive.

This paper makes the following contributions:


#### **2 Related Work**

There are several established and well-maintained tools and frameworks for the Java ecosystems. Frameworks from academia include Soot [16], which is a static analysis framework that allows call-graph construction, computation of pointsto information and solving of data-flow problems for Java and Android. Soot does not support inter-procedural data-flow analyses directly. However, a user can solve such problems using the Heros [7] extension that implements an IFD-S/IDE solver. The WALA [33] framework provides similar functionalities for Java bytecode, JavaScript and Python. OPAL [14] allows for the implementation of abstract interpretations of Java bytecode. Also the manipulation of bytecode is supported. A declarative approach is implemented by the Doop framework [13]. Doop uses a declarative rule set to encode an analysis and solves it using the logic-based Datalog solver. The framework allows for pointer analysis of Java programs and implements a range of algorithms that can be used for context insensitive, call-site and object sensitive analyses.

Tooling for C/C++ includes Cppcheck [12] which aims for a result without false positives and allows to encode simple rules as well as the development of more powerful add-ons. The clang-tidy tool [9] provides built-in checks for style validation, detection of interface misuse as well as bug-finding using simple rules, but can be extended by a user. Checks can be written on preprocessor level using callbacks or on AST level using AST matchers that can be specified using an embedded domain specific language (EDSL). The Clang Static Analyzer [8] uses symbolic execution and allows custom checks to be written. The SVF [31] framework computes points-to information for constructing sparse value flow and memory static single assignment (SSA). Hence, it can be used for analyses that rely on those information such as memory leak detection or null pointer analysis. Additionally, more precise pointer analysis can be build on top of SVF's results. However, as the computation of memory SSA does require a significant amount of computation, using SVF may not pay off for problems that can be encoded using distributive frameworks, which allow fast, summary-based solutions.

There are also commercial, closed-source tools for static analysis such as CodeSonar [10] and Coverity [11], both of which support analyses for C, C++, Java and other languages. Whereas these products are attractive to industry as they provide polished user interfaces, they are not usable for evaluating novel algorithms and ideas in static-analysis research.

### **3 Data-Flow Analysis**

Data-flow analysis is a form of static analysis which works by propagating information about the property of interest—the data-flow facts—through a model of the program, typically a control-flow graph, and captures the interactions of the flow facts with the program. The interaction of a single statement s with a dataflow fact is described by a flow function. There are two orthogonal approaches [27] that can be used in order to solve inter-procedural (whole program) data-flow problems: the call-strings and functional approach. For the call-strings approach we refer the reader to related work [15,27]. In the following we briefly present the functional approach using a linear constant propagation that we apply to a small program shown in Listing 1.1. A linear constant propagation is a dataflow analysis that precisely tracks variables with constant values and variables that linearly (c = a · x + b, with a, b constant values) depend on constant values through the program. Non-linear dependencies are over-approximated. In our example, we restrict the analysis to keep track of integer constants only. Such an analysis can be used to perform program optimizations by replacing variables with their constant values, and folding expressions that use constant values, eventually possibly also removing dead code. The analysis would be able to optimize the program shown in Listing 1.1 to **int** main() {**return** 12; }.

```
1 int inc ( int p ) { return ++p ; }
2 int main () {
3 int a = 1;
4 int b = 2;
5 int c = 3;
6 a = inc (a); // cs1
7 b = inc (b); // cs2
8 c=b ∗ 4 ;
9 return c ;
10 }
```
**Listing 1.1.** Program P

If the flow functions of the problem to be solved are monotone and distributive over the merge operator, it can be encoded using *Inter-procedural Finite Distributive Subset* (IFDS) or *Inter-procedural Distributive Environments* (IDE). Unlike the call-string approach which is limited to a certain level of contextsensitivity (commonly denoted as k), IFDS and IDE are fully context-sensitive, i.e., k = ∞. In IFDS [24] and its generalization IDE [26], a data-flow problem is transformed into a graph reachability problem. The reachability is computed using the so called exploded super-graph (ESG). If a node (s*i*, d*i*) in the ESG is reachable from a special tautological node Λ, the data-flow fact represented by d*<sup>i</sup>* holds at statement s*i*. The ESG is built according to the flow functions which can be represented as bipartite graphs. Functions for generating (Gen) and destroying (Kill) data-flow facts can be encoded into flow functions making the framework compatible to more traditional approaches to data-flow analysis. The composition f ◦ g of two functions can be computed by composing their corresponding bipartite graphs, i.e., merging the nodes of g with the corresponding nodes of the domain of f. The ESG for the complete program is constructed by replacing every node of the inter-procedural control-flow graph (ICFG) with the graph representation of the corresponding flow function. Scalability issues due to context-sensitivity are mitigated through summaries that are computed by composition of all bipartite graphs of a function for a given input. These summaries are reused for subsequent calls to an already summarized function.

The complexity of the IFDS algorithm is O(|N|·|D| <sup>3</sup>) where <sup>|</sup>N<sup>|</sup> is the number of nodes on the ICFG (or number of program statements) and |D| the size of the data-flow domain that is used. To make the analysis scale, the domain D should thus be kept small.

In IDE, a generalization of the IFDS framework, the edges of the ESG are additionally annotated with so-called *edge functions*. With the help of those edge functions, an additional value-computation problem can be encoded, which is solved while performing the reachability computation. The complexity of the IDE algorithm is the same as for the IFDS algorithm. Many problems can be solved more efficiently by encoding them with IDE rather than IFDS, because IDE uses two domains to solve a given problem. In addition to the domain D of the data-flow facts, the value computation problem is formulated over a second value domain V , which can be large, even infinite. Crucially, for a given fixed-size program, the complexity of both IFDS and IDE depends only on the size of D.

Let us consider a linear constant propagation to be performed on the example program shown in Listing 1.1. Using IFDS, the data-flow domain can be encoded by using pairs of V × <sup>Z</sup> program variables and integer values. However, this strategy leads to a huge domain D and prevents the generation of effective summaries. For each call to inc() in Listing 1.1 with a different input value a, a new summary must be generated. In the example, we would obtain summaries {(p, 1) → (<ret>, 2)} for call site *cs1* and {(p, 2) → (<ret>, 3)} for *cs2*.

With IDE, the problem can be encoded in a more elegant and efficient way, by using <sup>V</sup> as the data-flow domain and <sup>Z</sup> as the value domain. The ESG for a linear constant propagation performed on Listing 1.1 using IDE is shown in Fig. 1. As the context-dependent part of the analysis is encoded using the edge functions, only one summary is generated for the inc() function, λx. x + 1.

Performing a reachability check on the ESG for variable c at line 9, one finds that c can be replaced by the literal 12. Because the return statement is the program's only observable effect, all other statements can be safely removed.

**Fig. 1.** Exploded super graph for the program P in Listing 1.1

#### **4 Architecture**

Precise data-flow analysis requires information from multiple supporting analyses which are typically run earlier, such as class-hierarchy, call-graph, and pointsto analysis. Algorithmic frameworks like IFDS provide a generalized algorithm that is then parameterized for each individual data-flow problem. The infrastructure provided by these basic analyses and algorithmic frameworks is necessary to allow analysis designers to efficiently concentrate on the goal of a data-flow analysis. PhASAR is the first framework to provide such infrastructure for programs written in the C/C++ language family. Its infrastructure is designed modularly, such that analysis developers can choose the components necessary for their individual goals. In Fig. 2 we present the high-level architecture of the framework.

We allow PhASAR to be used in multiple ways. The first (and easiest) way is through its commandline interface. Its implementation can be seen as a blueprint to create other tools which use PhASAR. The command-line interface provides a means to execute basic analyses such as call-graph construction or pointer analysis or run pre-defined IFDS/IDE-based analyses. The output of these analyses

**Fig. 2.** PhASAR's high-level architecture

can then be processed using other tooling or presented to the user directly.

The command-line interface can also be extended with custom analyses, provided as separately compiled plugins. Currently, custom control-flow or callgraph analyses and custom data-flow analyses can be packaged in this way. The command-line interface acts as the runtime for these plugins and delegates control to the plugin at the appropriate times providing necessary information. Plugin providers need to create an implementation of a pre-defined C++ class wrapping their analysis code. The plugin is compiled separately and then provided to PhASAR in form of a shared object library.

PhASAR can also be included into other tools by using it as a library. This way of using PhASAR provides the most flexibility as developers can freely select the components that should be part of an analysis and can reuse even parts of the components provided by the framework.

PhASAR allows analysis developers to specify arbitrary data-flow problems, which are then solved in a fully-automated manner on the specified LLVM IR target code. Solving a static analysis problem on the IR rather than the source language makes the analysis generally easier. This is because it removes the dependency on the concrete source language, as the IR is usually simpler since the IR involves no nesting and has fewer instructions. Various compiler frontends for a wide range of languages targeting LLVM IR exist. Hence, PhASAR is able to analyze programs written in languages other than C/C++, too. The framework computes all required information to perform an analysis such as points-to, call-graph, type-hierarchy as well as additional parameterizable taint and typestate analyses.

PhASAR provides various capabilities and interfaces to compute data-flow problems or aid other types of analyses. First, the framework contains interfaces and implementations for the computation of an ICFG; we provide some parameterizable implementations for the LLVM IR.

Next, PhASAR currently supports the computation of function-wise pointsto information using LLVM's implementations of the *Andersen*-style [6] or *Steensgard*-style [30] algorithms. Points-to information and ICFG computation can be combined to obtain more precise results. We discuss the quality of pointsto information and our current efforts to improve their quality in Sect. 8.

To resolve virtual function calls in C++, we provide means to construct a type hierarchy. We construct the type hierarchy for composite types and reconstruct the virtual-method tables from the IR, which together with the hierarchy information allow PhASAR to resolve potential call targets at a given call-site.

PhASAR provides implementations of IDE and IFDS solvers as described by Reps et al. [24] including the extensions of Naeem et al. [20]. We implemented IFDS as a specialization of IDE using a binary lattice only using a top and a bottom element much alike the Heros implementation [7]. Both solvers are accompanied by a corresponding interface for problem definition. To solve a data-flow problem using the IDE or IFDS solver, the data-flow problem must be encoded by implementing this interface. We present this in detail in Sect. 5.

For non-distributive data-flow problems PhASAR provides an implementation of the traditional monotone framework which allows one to solve intraprocedural problems. The framework provides an inter-procedural version as well that uses a user-specified context in order to differentiate calling-contexts. PhASAR provides a context interface and implementations of this interface that realize the call-strings and value-based approach VASCO [22], in which contextsensitivity is achieved by reusing information that has been computed for previous calls under the same context. The framework also implements a version of the context class to represent a *null context*. This context has the same effect as applying the monotone framework directly in an inter-procedural setting. Both solvers are accompanied by corresponding interfaces for problem descriptions which must be implemented to encode the data-flow problem. The details are provided in Sect. 5.

All of PhASAR's data-flow solvers are implemented in a fully generic manner and heavily make use of templates and interfaces. For instance, a solver follows a target program's control-flow that is specified through an implementation of either the CFG or the ICFG interface. Analysis developers can parameterize a solver with an existing implementation or they can provide their own custom implementation. They can run a forward or backward analysis depending on the direction of the chosen control-flow graph. Moreover, all data-flow related functionality is hidden behind interfaces. A solver queries the required functionality such as flow functions or merge operations for the underlying lattice whenever necessary. We have specified problem interfaces on which the corresponding solver operates. Thus, analysis developers encode their data-flow problem by providing an implementation for the problem interface and provide this implementation to the accompanying solver. PhASAR is able to solve a problem on other IRs when suitable implementations for the IR specific parts such as the control-flow graphs and problem descriptions are provided by the analysis developer.

### **5 Implementation**

Our goal with PhASAR is easing the formulation of a data-flow analysis such that an analysis developer only needs to focus on the implementation of the problem description rather than providing details how the problem is solved.

PhASAR achieves parts of its generalizability through template parameters. These template parameters include, among others, N, D, M. They are consistently used throughout the implementation of PhASAR. N denotes the type of a node in the ICFG, i.e., typically an IR statement, D denotes the domain of the data-flow facts, and M is a placeholder for the type of a method/function. When analyzing LLVM IR, N is always of type const llvm::Instruction\* and M is of type const llvm::Function\*, whereas D depends on the specific data-flow analysis that the developer wants to encode. For our example using linear constant propagation described in Sect. 3, D = pair<const llvm::Value \*, int> could be used to capture the property of interest. LLVM's Value type is quite useful as it is a super-type that is located high in the type hierarchy. This allows an analysis developer to use values of all of Value's subtypes in the value domain, which makes it highly flexible.

#### **5.1 Encoding an IFDS Analysis**

Listing 1.2 shows the interface for an IFDS problem. An analysis developer has to define a new type—the problem description—implementing the FlowFunctions interface.

```
template <typename N, typename D, typename M> struct FlowFunctions {
  virtual ˜FlowFunctions () = default ;
  virtual FlowFunction<D> ∗ getNormalFlowFunction (N curr , N succ ) = 0 ;
  virtual FlowFunction<D> ∗ getCa l lF lowFunct ion (N ca l lStmt ,
                                                     M destMthd ) = 0;
  virtual FlowFunction<D> ∗ getRetFlowFunction (N c a l l S i t e ,
                                                    M calleeMthd ,
                                                    N exitStmt ,
                                                    N retSite ) = 0;
  virtual FlowFunction<D> ∗
  getCallToRetFlowFunction (N c a l l S i t e , N r e t S i t e , set <M> callees ) = 0;
} ;
```
**Listing 1.2.** Interface for specifying flow functions in IFDS/IDE

The flow function factories shown in Listing 1.2 handle the different types of flows. The four factory functions each have an individual purpose:


These flow function factories are automatically queried by the solver, based on the inter-procedural control-flow graph.

The functions in Listing 1.2 are factories since they have to return small function objects of type FlowFunction which is shown in Listing 1.3. As a FlowFunction is itself an interface, an analysis developer has to provide a suitable implementation. The member function computeTargets() takes a value of a dataflow fact of type D and computes a set of new dataflow facts of the same type. It specifies how the bipartite graph for the statement that represents the flow function is constructed and can be thought of an answer to the question "What edges must be drawn?".

```
template <typename D> struct FlowFunction {
  virtual ˜FlowFunction () = default ;
  virtual set <D> computeTargets (D sour c e ) = 0 ;
} ;
```
As flow function implementations often follow certain patterns, we provide implementations for the most common patterns as template classes. Many useful flow functions like Gen, GenIf, Kill, KillAll, and Identity are already implemented and can be directly used. Any number of flow functions can be easily combined using our implementations of the Compose and Union flow functions. We also provide MapFactsToCallee and MapFactsToCaller flow functions that automatically map parameters into a callee and back to a caller, since this behavior is frequently desired. Flow functions which are stateless, e.g. Identity or KillAll, are implemented as a singleton.

#### **5.2 Encoding an IDE Analysis**

If an analysis developer wishes to encode their problem within IDE, they have to additionally provide implementations for the edge functions. With help of the edge functions, an analysis developer is able to specify a computation which is performed along the edges of the exploded super-graph leading to the queried node (c.f. Fig. 1). The interface for the edge function factories and their responsibilities are analogous to the flow function factories in Listing 1.2.

Each edge function factory must return an edge function implementation: a small function object similar to a flow function which has a computeTarget() function, a compose, a merge, and an equality-check operation. The EdgeFunction interface is shown in Listing 1.4.

```
template <typename V> class EdgeFunction {
public :
  virtual ˜EdgeFunction () = default ;
  virtual V computeTarget (V sour c e ) = 0 ;
  virtual EdgeFunction<V> ∗
  composeWith ( EdgeFunction<V> ∗ secondFunction ) = 0;
  virtual EdgeFunction<V> ∗
  joinWith (EdgeFunction<V> ∗ otherFunction ) = 0;
  virtual bool equal to (EdgeFunction<V> ∗ other ) const = 0;
} ;
```
**Listing 1.4.** Interface for an edge function in IDE

As this interface is more complex than the flow function interface, we explain the purpose of each function. The computeTarget() function describes a computation over the value domain V in terms of lambda calculus.

The composeWith() function encodes how to compose two edge functions. In most scenarios, this function can be implemented as (f ◦ g)(x) = f(g(x)). To avoid additional boilerplate code, we provide an EdgeFunctionComposer class that performs this job and can be used as a super class.

joinWith() encodes how to join two edge functions at statements where two control-flow edges lead to the same successor statement. Depending if a may or a must-analysis is performed, implementations of this function typically check which edge function computes a value that is higher up in the lattice, i.e., a more approximate value, and returns the corresponding edge function. For our linear constant propagation from Sect. 3, this function would return one of the edge functions if both describe the same value computation, the bottom edge function if both of them encode the ⊥ value and the edge function encoding the top element otherwise. The intuition here is to always pick the element that is higher in the lattice as it represents more information.

The equal to() interface function has to be implemented to return true if both edge functions describe the same value computation, false otherwise.

A complete implementation of the IDE linear constant propagation can be found along with PhASAR's other examples at our website [23].

#### **5.3 Encoding a Monotone Analysis**

If an analysis developer wishes to encode a problem that does not satisfy the distributivity property, they have to make use of the monotone-framework implementation or its inter-procedural variant. The interface for specifying an interprocedural monotone problem is shown in Listing 1.5. Similar to an IFDS/IDE problem, an analysis developer has to specify flow functions for intra- and interprocedural flows. But in contrast to IFDS/IDE, these flow functions do not operate on single, distributive data-flow facts, but on sets of data-flow facts instead. The solver calls the flow functions and provides the set of data-flow facts which hold right before the current statement. The return value to be computed in the flow function is a set of data-flow facts that hold after the effects of the current statement. The join() function specifies how information is merged when two branches join at a common successor statement. This is typically implemented as set-union or set-intersection depending on whether a may or must-analysis has to be solved. Algorithms from C++'s STL may be used here. Finally, the sqSubSetEqual() function must be implemented to determine if the amount of information between two sets has increased in order to check if a fixpoint is reached. The context that is used for the inter-procedural analysis can be specified by the analysis developer using the template parameter. An analysis developer can provide a pre-defined context class in order to parameterize the analysis to be a call-strings approach, a value-based approach, or they can define their own context to be used.

```
template <typename N, typename D, typename M, typename I>
struct InterMonotoneProblem {
  InterMonotoneProb lem ( I I c f g ) : ICFG( I c f g ) {}
  virtual ˜InterMonotoneProblem () = default ;
  virtual set <D> join ( const set <D> &Lhs , const set <D> &Rhs ) = 0 ;
  virtual bool sqSubSetEqual ( const set <D> &Lhs ,
                                 const set <D> &Rhs ) = 0 ;
  virtual set <D> normalFlow (N Stmt , const set <D> &In ) = 0 ;
  virtual set <D> ca l lF low (N Ca l lS i t e , M Ca l lee , const set <D> &In ) = 0 ;
  virtual set <D> returnFlow (N Ca l lS i t e , M Callee , N RetStmt ,
                                N RetSite , const set <D> &In ) = 0 ;
  virtual set <D> callToRetFlow (N C a l lS i t e , N RetS ite ,
                                    const set <D> &In ) = 0 ;
} ;
```
**Listing 1.5.** Interface for describing an interprocedural problem for the monotone framework

#### **5.4 Handling of Intrinsic and Libc Function Calls**

LLVM currently has approximately 130 intrinsic functions. These functions are used to describe semantics in the analysis and optimization phase and do not have an actual implementation. Later-on in the compiler pipeline, the back-end is free to replace a call to an intrinsic function with a software or a hardware implementation – if one exists for the target architecture. Introducing new intrinsic functions is preferred over introducing novel instructions to LLVM since, when introducing a new instruction, all optimizations, analyses, and tools built on top of LLVM have to be revisited to make them aware of the new instruction. A call to an intrinsic function can be handled as an ordinary function call.

The functions contained in the libc standard library represent special targets as well as these functions are used by virtually all practical C and C++<sup>1</sup> programs. Moreover, the functions contained in the standard library cannot be analyzed themselves as they are mostly very thin wrappers around system calls and are often not available for the analysis. In many cases, however, it is not necessary to analyze these functions when performing a data-flow analysis. PhASAR

<sup>1</sup> The compiler translates many of C++'s features into ordinary calls to libc.

models all of them as the identity function. An analysis developer can change the default behavior and model different effects by using special summary functions. The SpecialSummaries class can be used to register flow and edge functions other than identity. This class is aware of all intrinsic and libc functions.

### **5.5 A Note on Soundness**

Livshits et al. have introduced the notion *soundy* analyses [18]. Soundy analyses use sensible underapproximations to cope with certain language features that would otherwise make an analysis impractically imprecise. Analyses in PhASAR are currently *soundy*. For instance, PhASAR's ICFG misses one control-flow edge in the presence of setjmp()/longjmp(). Functions that are loaded dynamically from shared object libraries using dlsym() cannot be handled either. PhASAR's data-flow solvers treat calls to dynamically loaded libraries and libraries for which function definitions are missing as identity, unless the analysis developer specifies otherwise. A sound handling would be to set all variables involved in such calls to , which again, may lead to large imprecision.

### **6 Scalability**

In this section, we present the runtime measurements for two concrete static analyses – IFDSSolverTest we name I and IFDSTaintAnalysis we name T – that are both implemented in PhASAR. I is a trivial IFDS analysis which passes the tautological data-flow fact Λ through the program. The analysis acts as a baseline as it is the most efficient IFDS/IDE analysis that can possibly be implemented. T implements a taint analysis. A taint analysis tracks values that have been tainted by one or more sources through the program and reports whenever one of the tainted values reaches a sink, which can be functions or instructions. Our taint analysis treats the command-line parameters argc and argv that are passed into the main() function as tainted. Functions that read values from the outside (e.g. fread()) are interpreted as sources. Functions that can leak tainted variables to the outside such as printf() or fwrite() are considered sinks. As a potentially large amount of tainted values have to be tracked through the program, analysis T will provide insights into the scalability of PhASAR's IFDS/IDE solver implementation.

Table 1 shows the programs that we analyzed. For each program, the IR's lines of code, number of statements, pointers, and allocation sites have been measured with PhASAR. The LLVM IR has been compiled with the Clang compiler using production flags. The figures give an intuition for the program's complexity. The programs that we analyzed comprise some C programs like some of the coreutils [3] as well as two C++ programs like PhASAR itself and a PhASAR-based tool MPT. In addition, it shows the runtimes of the analyses I and T separated into different phases (in the format runtime I/runtime T). We measured the runtimes for the construction of points-to information (PT), class hierarchy (CH), call-graph (CG), data-flow information (DF), and the total runtime (Σ). We also measured the number of function summaries ψ(f) that could be reused while solving the analysis. The latter one is a good indicator for the quality of the data-flow domain D, as higher reuse indicates a more efficient analysis. #G and #K denote the number of facts that have been generated or killed in the taint analysis, respectively.

We measured the runtimes by performing 15 runs for each analysis on a virtual machine running on an Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30 GHz machine with 128 GB memory. We removed the minimum and maximum values and computed the average of the remaining 13 values for each of the four analysis steps and the total runtime. We used an on-the-fly call-graph algorithm that uses points-to information for the coreutils. For PhASAR and MPT, we used a declared type-analysis (DTA) call-graph algorithm in order to reduce the amount of memory required to reproduce our results. In addition, we found that DTA performed well enough on our C++ target programs.

With one exception, PhASAR is able to analyze a program from coreutils within a few seconds. Analyzing cp using T takes around 13 min. This is because a large amount of facts is generated which must then be propagated by the solver. This result shows the cubic impact of the number of data-flow facts on IFDS/IDE's complexity. Analyzing the million-line programs PhASAR and MPT ranges from 7 to 18 min. As one can observe for PhASAR, an analysis may destroy data-flow facts more often than it generates them. This is caused by C++'s exceptional control-flow where the same fact is destroyed during normal and exceptional flow.

We observed that the DF part of T actually runs faster than I for our C++ target programs. This is because T should behave very similar to the solvertest for the C++ target programs, as only very few facts are actually generated. Furthermore, T will take shortcuts whenever it plugs in the desired effects at call-sites of source and sink functions. I in contrast, follows these calls making it slower than T.


**Table 1.** Program's characteristics and performance figures for analyses I/T

Analyzing all of the 97 coreutils, PhASAR, and MPT requires a total analysis time of of 30 min for I and 1 h and 31 min for T. These measurements show that PhASAR is capable of analyzing even a million-line program within minutes, even though PhASAR's algorithms and data structures have not yet undergone manual optimization.

### **7 Guidelines for the Analysis on Real-World Code**

In this section, we share our experience in analyzing real-world C/C++ programs. Although the LLVM IR is expressive enough to capture arbitrary source languages, we found that the characteristics and complexity of the source language propagate into the IR. Observe the following call-site in LLVM IR:

%retval = call i32 %fptr(%class.S<sup>∗</sup> dereferenceable(4) %ptr, i32 5), assuming C to be the source language, a plain function pointer is called. If C++ is the source language, we cannot be sure whether a function pointer or a virtual member function of class S is called. This is the reason why we observed that the analysis runtime for C++ target programs is usually much higher than for C programs.

For more complex languages like C++ we have to keep track of special member functions. These functions are mapped into ordinary LLVM IR functions that Clang places in a well-defined order in the generated IR. For some analyses like the declared-type analysis (DTA) call-graph algorithm, we need to be aware of these special member functions in order to preserve high precision.

We also found that even a well-debugged analysis that has been hardened on a large variety of test programs may still fail on production code as some corner cases have not been thought of. The large amount of information available to an analysis run makes debugging errors hard. A standard debugger does not suffice because an analysis writer has to step through a lot of code that is not relevant for them. For Java, a special dedicated debugger for static analysis has been developed [21] which shows the relevance of the problem.

Depending on the optimization passes that have been applied to code in LLVM IR before it is handed over to the analysis, it may have very different characteristics. Although optimization passes are required to have no impact on the semantics, the structure of the IR code changes. In our experience, it is helpful to start developing an analysis on small test programs that are translated into IR without optimization passes, and cover as many cases as the analysis should find. Once an analysis handles these test cases correctly or with the desired precision, optimization passes should be applied to the test cases. After rerunning the analysis the results should be checked against their unoptimized version. When applying an analysis to production code, the code should be compiled using production flags in order to analyze code that is as close as possible to what actually runs on the machine.

We found that the usage of debug symbols is helpful. The Clang compiler's -g flag can be added to propagate the debug symbols into the IR. Those can then be queried using LLVM's corresponding API. However, the debug symbols may not always present, which is why an analysis should not rely on them.

#### **8 Future Work**

In this section we briefly summarize our plans for future improvements.

It would be interesting to evaluate the use of PhASAR for analyzing a different IR. One type of IR might advantages over others for different analysis problems. We plan to additionally support the GENERIC, GIMPLE and RTL [5,19] IR from the GCC project.

Another interesting framework for data-flow analysis is *Weighted Pushdown Systems* (WPDS) [25,28]. WPDS is able to compute an analysis within a stack automaton. WPDS allows for more compact data structures, the generation of witnesses, as well as precise queries specifying paths of interest using regular expressions. We plan to support WPDS is a future version of PhASAR using the weighted/nested-word automaton library [32].

Checking the correctness of an IFDS/IDE analysis is complex since checking the correctness of the underlying ESG is tedious and time consuming. A high quality visualization may help reduce the amount of time spent debugging an analysis. A graphical user interface will reduce the amount of knowledge that is required to use the framework.

Since the flow and edge functions have to be implemented in a general purpose programming language, they require some amount of boilerplate code. It remains an open question if one could design a non-Turing-complete EDSL with a library like boost::proto [1] which simplifies the task of encoding analysis problems.

PhASAR currently uses LLVM's points-to information which is rather imprecise. We plan to integrate a more precise pointer analysis into PhASAR to support more precise call-graph construction and client analyses by adapting the demand-driven Boomerang approach presented in [29] to PhASAR.

#### **9 Conclusion**

In this paper, we presented our implementation of a static analysis framework for programs written in C/C++ named PhASAR. We presented its architecture and implementation from a user's perspective to make practical static analysis more accessible. We presented experiments which have shown PhASAR's scalability and discussed the runtimes of the key parts of two concrete client analyses.

With PhASAR we strive toward the goals of providing a framework for static analysis targeting (but not limited to) C/C++, a base for quickly evaluating novel ideas and applications, and a suitable way of handling the complexity. PhASAR is open-source and available online [23] under the permissive MIT licence, and therefore, open for contributions, feedback and use. PhASAR has already received tremendous support in the research community and from practitioners as 223 stars and 26 forks on GitHub show.<sup>2</sup>

<sup>2</sup> As of 8am February 07, 2019.

**Acknowledgments.** This work was partially supported by the German Research Foundation (DFG) within the Collaborative Research Centre "On-The-Fly Computing" (SFB 901) and the Heinz Nixdorf Foundation. We would also like to thank Richard Leer for his assistance in developing and improving the framework.

### **References**


**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

### Author Index

Abate, Alessandro II-247 Amparore, Elvio Gilberto II-285 André, Étienne II-211 Arcak, Murat II-265 Baarir, Souheib I-135 Babar, Junaid II-303 Bakhirkin, Alexey II-79 Barbon, Gianluca I-386 Basset, Nicolas II-79 Becker, Nils I-99 Belmonte, Gina I-281 Beneš, Nikola II-339 Biere, Armin I-41 Bisping, Benjamin I-244 Blanchette, Jasmin Christian I-192 Bläsius, Thomas I-117 Blicha, Martin I-3 Bloemen, Vincent II-211 Bodden, Eric II-393 Bozga, Marius II-3 Bozzano, Marco I-379 Brain, Martin I-79 Brim, Luboš II-339 Bruintjes, Harold I-379 Bunte, Olav II-21 Butkova, Yuliya II-191 Castro, Pablo F. II-375 Cauchi, Nathalie II-247 Češka, Milan II-172 Chen, Taolue I-155 Chen, Yu-Fang I-365 Christakis, Maria I-226 Ciancia, Vincenzo I-281 Ciardo, Gianfranco II-285, II-303 Cimatti, Alessandro I-379 Cruanes, Simon I-192

D'Argenio, Pedro R. II-375 Dawes, Joshua Heneage II-98 de Vink, Erik P. II-21 Demasi, Ramiro II-375 Donatelli, Susanna II-285

Eles, Petru I-299 Enevoldsen, Søren I-316 Esparza, Javier II-154 Fox, Gereon II-191 Franzoni, Giovanni II-98 Friedrich, Tobias I-117 Fulton, Nathan I-413 Ganjei, Zeinab I-299 Gao, Pengfei I-155 Gligoric, Milos I-174 Govi, Giacomo II-98 Groote, Jan Friso II-21 Guldstrand Larsen, Kim I-316 Gupta, Aarti I-351 Gupta, Rahul I-59 Hahn, Christopher II-115 Hahn, Ernst Moritz I-395 Hartmanns, Arnd I-344 Hasuo, Ichiro II-135 Heizmann, Matthias I-226

Henrio, Ludovic I-299 Hermann, Ben II-393 Heule, Marijn J. H. I-41 Huang, Bo-Yuan I-351 Hyvärinen, Antti E. J. I-3

#### Iosif, Radu II-3

Jansen, Nils II-172 Jiang, Chuan II-303 Junges, Sebastian II-172

Katelaan, Jens II-319 Katoen, Joost-Pieter I-379, II-172 Keiren, Jeroen J. A. II-21 Khaled, Mahmoud II-265 Khurshid, Sarfraz I-174 Kiesl, Benjamin I-41 Kim, Eric S. II-265 Klauck, Michaela I-344 Kofroň, Jan I-3

Konnov, Igor II-357 Kordon, Fabrice I-135 Kosmatov, Nikolai I-358 Kura, Satoshi II-135

Latella, Diego I-281 Laveaux, Maurice II-21 Le Frioux, Ludovic I-135 Le Gall, Pascale I-358 Lee, Insup I-213 Leroy, Vincent I-386 Li, Yong I-365 Liu, Si II-40

Majumdar, Rupak II-229 Malik, Sharad I-351 Mansur, Muhammad Numair I-226 Massink, Mieke I-281 Matheja, Christoph II-319 Meel, Kuldeep S. I-59 Meijer, Jeroen II-58 Meseguer, José II-40 Meßner, Florian I-337 Meyer, Philipp J. II-154 Miner, Andrew II-285, II-303 Müller, Peter I-99

Neele, Thomas II-21 Nestmann, Uwe I-244 Noll, Thomas I-379

Offtermatt, Philip II-154 Ölveczky, Peter Csaba II-40 Osama, Muhammad I-21

Pajic, Miroslav I-213 Park, Junkil I-213 Parker, David I-344 Pastva, Samuel II-339 Peng, Zebo I-299 Perez, Mateo I-395 Petrucci, Laure II-211 Pfeiffer, Andreas II-98 Piterman, Nir II-229 Platzer, André I-413 Prevosto, Virgile I-358 Putruele, Luciano II-375

Quatmann, Tim I-344

Reger, Giles II-98 Rezine, Ahmed I-299 Rilling, Louis I-358 Robles, Virgile I-358 Roy, Subhajit I-59 Ruijters, Enno I-344 Saarikivi, Olli I-372 Šafránek, David II-339 Salaün, Gwen I-386 Schanda, Florian I-79 Schewe, Sven I-395 Schilling, Christian I-226 Schmuck, Anne-Kathrin II-229 Schubert, Philipp Dominik II-393 Schulz, Stephan I-192 Sharma, Shubham I-59 Sharygina, Natasha I-3 Sifakis, Joseph II-3 Sokolsky, Oleg I-213 Somenzi, Fabio I-395 Song, Fu I-155 Sopena, Julien I-135 Srba, Jiří I-316 Stenger, Marvin II-115 Sternagel, Christian I-262, I-337 Stoilkovska, Ilina II-357 Summers, Alexander J. I-99 Sun, Xuechao I-365 Sun, Youcheng I-79 Sutton, Andrew M. I-117

Tentrup, Leander II-115 Tonetta, Stefano I-379 Trivedi, Ashutosh I-395 Turrini, Andrea I-365

Urabe, Natsuki II-135

van de Pol, Jaco II-58, II-211 van Dijk, Tom II-58 Veanes, Margus I-372 Vukmirović, Petar I-192

Wan, Tiki I-372 Wang, Kaiyuan I-174 Wang, Qi II-40 Wang, Wenxi I-174 Wesselink, Wieger II-21 Widder, Josef II-357

Wijs, Anton I-21, II-21 Willemse, Tim A. C. II-21 Wojtczak, Dominik I-395 Wüstholz, Valentin I-226

Xie, Hongyi I-155 Xu, Eric I-372 Xu, Junnan I-365

Yamada, Akihisa I-262

Zamani, Majid II-265 Zhang, Hongce I-351 Zhang, Jun I-155 Zhang, Min II-40 Zuleger, Florian II-319, II-357